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Abstract. The best current estimates of the thresholds for the existence of solutions in random constraint satis- 
faction problems ('CSPs') mostly derive from the first and the second moment method. Yet apart from a very few 
exceptional cases these methods do not quite yield matching upper and lower bounds. According to deep but non- 
rigorous arguments from statistical mechanics, this discrepancy is due to a change in the geometry of the set of 
solutions called condensation that occurs shortly before the actual threshold for the existence of solutions (Krza- 
kala, Montanari, Ricci-Tersenghi, Semerjian, Zdeborova: PNAS 2007). To cope with condensation, physicists have 
developed a sophisticated but non-rigorous formalism called Surx'ey Propagation (Mezard, Parisi, Zecchina: Sci- 
ence 2002). This formalism yields precise conjectures on the threshold values of many random CSPs. Here we 
develop a new Sur\'ey Propagation inspired second moment method for the random fc-NAESAT problem, which is 
one of the standard benchmark problems in the theory of random CSPs. This new technique allows us to overcome 
the barrier posed by condensation rigorously. We prove that the threshold for the existence of solutions in random 
fc-NAESAT is 2^-^ ln2 - + 3) + £fe> where |efc| < 2-(i-°'= (i"*^, thereby verifying the statistical mechanics 
conjecture for this problem. 
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1 Introduction 



Over the past decade, physicists have developed sophisticated but non-rigorous techniques for the study of 
random constraint satisfaction problems ('CSPs') such as random /c-SAT or random graph /c-coloring II27I29II . 
This work has led to a remarkably detailed conjectured picture, according to which various phase transitions 
affect both the combinatorial and computational nature of random problems. By now, some of these predic- 
tions have been turned into rigorous theorems. Examples include results on the "shattering" of the solution 
space II1I7I . work on (non-)reconstruction and sampling II18I24I30II . and even new algorithms for random 
CSPs II9I19II . Many of these contributions have led to the development of new rigorous techniques. Indeed, 
it seems fair to say that, combined, these results have advanced our understanding of random CSPs quite 
significantly. 

However, thus far substantial bits of the statistical mechanics picture have eluded all rigorous attempts. 
Perhaps most importantly, apart from a very few special cases, the precise thr^esholds for the existence of so- 
lutions in random CSPs have not been pinned down exactly. While rigorous upper and lower bounds can be 
derived via the, first and the second moment method lH, these bounds do not quite match in most examples, 
including prominent ones such as random A;-S AT or random graph A:-coloring. In fact, the statistical mechan- 
ics techniques suggest a striking explanation for this discrepancy, namely the existence of a condensation 
phase shortly before the threshold for the existence of solutions. In this phase, a crucial necessary condition 
for the success of the (standard) second moment method is violated. Indeed, in statistical mechanics a deep 
formalism called Survey Propagation ('SP') has been developed expressly to deal with condensation. While 
SP is primarily an analysis technique, an off-spin has been the SP guided decimation algorithm, which seems 
highly successful at solving random CSPs experimentally. 

In this paper we propose a new SP-inspired second moment method that allows us to overcome the barrier 
posed by condensation. The specific problem that we work with is random fc-NAESAT, one of the standard 
benchmark problems in the theory of random CSPs. Random fc-NAESAT is technically a bit simpler than 
random fc-SAT due to a certain symmetry property, but computationally and structurally both problems have 
strong similarities. We determine the threshold for the existence of solutions in random fc-NAESAT up to an 
additive error that tends to zero exponentially with k. This is the first time that the threshold in any random 
CSP of this type can be calculated with such accuracy. While from a technical viewpoint /c-NAESAT is 
perhaps the simplest example of a random CSP that exhibits condensation, our proof technique rests on 
a rather generic approach. Therefore, we believe that with additional technical work our approach can be 
extended to many other problems, including random fc-SAT or random graph fc-coloring. 

To define random fc-NAESAT formally, let A; > 3 and n > be integers and let V = {xi, . . . , x„} be a 
set of Boolean variables. For a fixed real r > we let m = m{n) = \rn\ . Further, let ^ = ^^(n, m) be 
a propositional formula obtained by choosing m clauses of length k over V uniformly and independently at 
random among all (2n)'^ possible clauses. We say that an assignment o" : F — )• {0, 1} is an NAE-solution (a 
"solution") if each clause has both a literal that evaluates to 'true' under a and one that evaluates to 'false'. 
In other words, both a and its inverse a : Xi ^ 1 — a{xi) we satisfying assignments of the Boolean formula 

We say that an event occurs with high probability ("w.h.p.") if its probability tends to one as n — >^ oo. 

Friedgut ll22l proved that for any k there exists a sharp threshold sequence r^.^AE = fk~^iKE{n) such 
that for any fixed e > w.h.p. # has a NAE-solution if r < rfc_NAE — e^ while w.h.p. ^ fails to have one if 
r > r/c_NAE + ^- It is widely conjectured but as yet unproven that the threshold sequence converges for any 
A; > 3. The best previous bounds on rfc_NAE were derived by Achlioptas and Moore 13J and Coja-Oghlan 
and Zdeborova |[T2l via the first/second moment method: 

r-sccond = 2'=-' In 2 - In 2 + 0^(1) < t^.nae < r-fi^st = 2^^^ \ii2-^^+ Ofc(l), (1.1) 

where Ofc(l) hides a term that tends to for large k. This left an additive gap of ^ In 2 « 0.347, which our 
main result closes. 



1 



Theorem 1.1. There is a sequence £k = 2 "''^^^^^ such that 

2^-1 In 2 - {^2. + \)-ek< r^.NAE < 2^'-iln2 - {)^l + \)+ek. (1.2) 

While the numerical improvement obtained in Theorem I l.ll may seem modest, we are going to argue that 
the result is conceptually quite significant for two reasons. First, we obtain (virtually) matching upper and 
lower bounds for the first time in a random CSP of this type. Second, and perhaps even more importantly, 
we devise a rigorous method for taming the condensation phenomenon. Indeed, condensation has been the 
main obstacle to determining the precise thresholds in random CSPs for the past decade. To understand why, 
we need to discuss the statistical mechanics picture and its relation to the second moment method. 

2 Condensation and the second moment method 

The statistical mechanics perspective. We follow |[27l to sketch the non-rigorous statistical mechanics 
approach on random A;-NAESAT. Let C {0, l}'^ denote the set of NAE-solutions of and let 

= be the number of solutions. We turn into a graph by considering two solutions 

cr, r adjacent if their Hamming distance is o(n). According to ll27l . the 'shape' of S{^) undergoes two 
substantial changes w.h.p. at certain densities < rsh < ?^cond < '"fc-NAE- 

The first transition occurs at r^h ~ 2^~^\n.{k) /k, almost a factor of k below r^.NAE. Namely, for 
r < Tsh, is (essentially) a connected graph. But in the shattering phase r^h < r < rcond> splits 
into connected components 5i, . . . , 5'7V(^) called clusters that are mutually separated by a linear Hamming 
distance J7(n). Each cluster Si only comprises an exponentially small fraction of S{^). In particular, the 
total number N{^) of clusters, the so-called complexity, is exponential in n. This "shattering" of S{^) was 
indeed established rigorously in HI. 

As the density r increases beyond r^^, both the overall number Z{^) of solutions and the number 
and sizes of the clusters shrink. However, the cluster sizes decrease at a slower rate than Z{^), until at 
density rcond = 2'''^^ln2 — ln2 + Ofc(l) the largest cluster has size Q{Z{^)) w.h.p. In effect, in the 
condensation phase rcond < r < rfc_NAE> the set 5(^) still decomposes into an exponential number of 
clusters 5i, . . . , 5'7V(^), each of tiny diameter and all mutually separated by Hamming distance [2{n). But 
in contrast to the shattered phase, now the largest cluster contains a constant fraction of the entire set S{^). 
Indeed, w.h.p. a bounded number of clusters contain a 1 — o(l)-fraction of all solutions. 

The dominance of a few large clusters in the condensation phase complicates the probabilistic nature of 
the problem dramatically. To see why, consider the experiment of first choosing a random formula and 
then picking two solutions <T,r G uniformly and independently. For r^h < r < rcond> f^,''' likely 

belong to different clusters, and hence can be expected to have a "large" Hamming distance. In fact, it is 
implicit in the previous work on the second moment method that dist(cr, r) ~ n/2 w.h.p. 1131121 . Intuitively, 
this means that the two random solutions "decon^elate". By contrast, for rcond < r < rk-NAE both cr, r 
belong to the same large cluster with a non- vanishing probability. In effect, with a non- vanishing probability 
their distance dist(<T, r) is tiny, reflecting that solutions in the same cluster ai^e heavily correlated. 

The purpose of the physicists' Survey Propagation technique is precisely to deal with this type of cor- 
relation. The basic idea is to work with a different, non-uniform probability distribution on This SP 
distribution is induced by first choosing a cluster Si uniformly at random among ^i, . . . , Sj\f(^^-^, and then 
selecting a solution in that cluster Si uniformly. Since the number N{^) of clusters is (thought to be) ex- 
ponential in n throughout the condensation phase, two solutions cr',T' chosen independently from the SP 
distribution are expected to lie in distinct clusters and thus to decorrelate w.h.p. 

Starting from this (appropriately formalized) decoiTclation assumption, the SP formalism prescribes a 
sequence of delicate (non-rigorous) steps to reduce the computation of the precise threshold r,fc„NAE to the 
solution of a continuous variational problem for any k > 3 II14I31L This variational problem is itself highly 



2 



non-trivial, but heuristic numerical techniques yield plausible approximations for small values of k ll28l . 
Moreover, asymptotically for large k the variational problem can be solved analytically. This led to the 
conjecture that rA,._NAE = 2''~^ In 2 - + i) + Ofc(l) iMl, which Theorem O resolves. 

Is Theorem ! 1 . IT optimal"? Of course, it would be interesting to prove that for any k, the precise threshold 
rfc_NAE equals the solution to the variational problem that the SP formalism spits out. However, given that 
this continuous problem itself appears difficult to solve analytically (to say the very least), it seems that such 
a result would merely establish the equivalence of two hard mathematical problems. Thus, we believe that 
Theorem 11.11 marks the end of the line as far as an analytic/explicit computation of rfc_NAE is concerned. 
The first and the second moment method. The above statistical mechanics picture holds the key to un- 
derstanding why the previous arguments did not suffice to pin down rk-NAE precisely. The best previous 
bounds (11.11 ) were obtained by applying the first/second moment method to the number Z{^) of solutions, 
or a closely related random variable. 

With respect to the upper bound, if for some density r the first moment E tends to as n gets 

large, then Z{^) = w.h.p. by Markov's inequality. Thus, r^.^AE < ^- Indeed, it is not difficult to verify 
that E = o(l) for r = rgrst lEl- This gives the upper bound in (11.11 ). 

The purpose of the second moment method is to bound t/^^nae from below. The general approach is 
this: suppose we can define a random variable Y = Y{^) > such that Y > only if ^ has a NAE- 
solution. Moreover, assume that for some density r, the second moment E[y^] satisfies 

E[y2] < (7 . E [y]2 (2.1) 

with C = C{k) > 1 dependent on k but not on n. Then the Paley-Zygmund inequality P [y > 0] > 
E [Yf /E[y2] imphes that 

P has a NAE-solution] > P > 0] > E[Y'^]/E [Yf >l/C > 0. (2.2) 

Because the A;-NAESAT threshold is sharp, and as C is independent of n, (12.21 ) implies that rfc_NAE > r. 

The obvious choice of random variable is the number Z{^) of solutions. Since is just the number 

of pairs of NAE-solutions, the second moment can be written as 

E[Z(*)2] = J2a,Te{o,i}" P [both a, r are NAE-solutions] . (2.3) 

Indeed, AchUoptas andMoore H proved that (O) is satisfied for Y = Z{^) if r < 2''-^ In 2-(l + In 2) /2. 
Improving upon 131, Coja-Oghlan and Zdeborova fT2| obtained the best previous lower bound (II. lb by con- 
sidering a slightly modified random variable Z'{^). Namely, Z'{^) = Z{^) ■ where ^ is a certain 
event such that ^ ^ A w.h.p. In other words, Z'{^) is equal to Z{^) for almost all formulas, but a small 
fraction of "bad" formulas (that would blow up the second moment) are excluded. Still, Z'{^) admits a 
similar decomposition as (12.31) (one just has to condition on A). 

As (12.31 ) shows, the second moment analysis of either Z{^) or Z'{^) boils down to studying the cor- 
relations amongst pairs of solutions. In fact, it was observed in BI12I that a necessary condition for the 
success of this approach is that two independently and uniformly chosen a,T ^ satisfy dist(<T, r) ~ 
n/2 w.h.p. But according to the statistical mechanics picture, this decorrelation condition is violated for 
f > '^cond due to the presence of large clusters. Therefore, it is not surprising that the best previous lower 
bound (11.11 ) on 7'a,_nae coincides with the (conjectured) condensation threshold rcond- Indeed, it was veri- 
fied in |[T2l that a certain "weak" form of condensation sets in at r ~ rcond- 

The statistical mechanics prescription to overcome these correlations is to work with the Survey Propa- 
gation distribution (first select a cluster uniformly, then choose a random solution from that cluster) rather 
than the uniform distribution over This is precisely the key idea behind our new SP -inspired second 

moment argument. Roughly speaking, we are going to develop a way to apply the second moment method 
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to the number N{^) of clusters, rather than the number of solutions. More precisely, we introduce a param- 
eter /3 that allows us to work with clusters of a prescribed size. A specific choice of /3 (namely, /3 = 1/2) 
corresponds to the SP distribution and thus to working with Y{^) = N{^). 

This new technique allows us to obtain various further results. For instance, we can pin down the typical 
values of both Z{^) and N{^) thi^oughout the condensation phase (details omitted). Furthermore, our proof 
entails the following result that confirms the physics conjecture that pairs of solutions drawn from the SP 
distribution decorrelate throughout the condensation phase. 

Corollary 2.1. Suppose that rcond ^ ^ 2^^^ In 2 — + 3) ~ ^fc- drawn independently 

from the SP distribution. Then dist{a', r') = + Ofc(l))n w.h.p. 

3 Related work 

Rigorous work. The fc-NAES AT problem is well-known to be NP-complete in the worst case for any > 3. 
In fact, the NP-complete problem of 2-coloring a fc-uniform hypergraph (with > 3) simply is the special 
case of fc-NAESAT without negations. The results in IIT2II are actually phrased in terms of hypergraph 2- 
coloring but caiTy over to A;-NAESAT directly. 

The main contribution of Theorem ll.ll is the improved lower bound. In fact, the upper bound in (11.21) can 
be obtained in several different ways. Achlioptas and Moore 131 state without proof that the (quite intricate) 
enhanced first moment argument from II16I26I can be used to show that rfc_NAE < 2^'~^ln2 — + 
\) + Ofc(l). This is indeed plausible as, in terms of the statistical mechanics intuition (which was unknown 
to the authors of II16I261 ) this ai^gument amounts to computing the first moment of the number of clusters. 
Alternatively, generalizing work of Franz and Leone iljj, Panchenko and Talagrand OTI proved that the 
variational problem that results from the SP formalism yields a rigorous upper bound on rfc_NAE> which is 
conjectured to be tight for any /c > 3. The variational problem can be solved asymptotically in the large-/c 
limit (unpublished), yielding the upper bound stated in Theorem ll.il In this paper we obtain the upper bound 
by a relatively simple third argument that has a neat combinatorial interpretation. 

The proofs of the lower bounds in II3I12I and in the present paper are non-constructive in the sense that 
they do not entail an efficient algorithm for finding a NAE-solution w.h.p. The best current algorithm for 
random fc-NAESAT is known to succeed for r < Ok{2'' /k), a factor of f2k{k) below rj^^NAE lEl- 

From a statistical mechanics point of view, many random CSPs are similar to random fc-NAESAT. In 
particular, the physics methods suggest the existence of a condensation phase in most random CSPs (e.g., 
random A;-SAT/graph fc-coloring). While f3| provided the prototype for the second moment arguments in 
these and other problems, the technical details in random graph A;-coloring ||4l or random fc-SAT are 
quite a bit more intricate than in random fc-NAESAT. 

For instance, random A;-NAESAT is simpler than random A:-SAT because for any NAE-solution a the 
inverse a : x ^ \ — cj(x) is a NAE-solution as well. This symmetry of the solution space under inversion 
simplifies the second moment calculations significantly. To cope with the absence of symmetry in random 
fc-SAT, Achlioptas and Peres ||6l weighted satisfying assignments cleverly in order to recover the beneficial 
analytic properties that symmetry induces. Our new second moment method is quite different from this 
weighting approach, since the asymmetry that called for the weighting scheme in ISl is absent in fc-NAESAT. 

None of the (few) random CSPs in which the threshold for the existence of solutions is known precisely 
has a condensation phase. The most prominent example is random A;-XORSAT (random linear equations 
mod 2) 11171321 . In this case, the algebraic nature of the problem precludes condensation: all clusters are 
simply translations of the kernel. Similarly, the condensation phase is empty in the uniquely extendible 
problem from ||T3]|. Also in random fc-SAT with k = ^(77,) > log2 re (i.e., the clause length grows as a 
function of n), where the precise threshold has been determined by Frieze and Wormald ll23l via the second 
moment method, condensation does not occur ifTTI . Nor does it in random 2-SAT II8I25II . 



4 



Paits of our proof require a precise analysis of geometry of the solution space S{^). This analysis 
harnesses some of the ideas that were developed in previous work 111171121151 (e.g., arguments for proving 
the existence of clusters or of "rigid variables"). However, we need to go beyond these previous arguments 
significantly in two respects. First, we need to generalize them to accommodate the parameter (3 that controls 
the cluster sizes. Second, we need rather precise quantitative information about the cluster structures. 

Survey Propagation guided decimation. The SP formalism has given rise to an efficient message passing 
algorithm called Survey Propagation guided decimation ('SPD') |[29l . Experimentally, SPD seems spectacu- 
larly successful at solving, e.g., random /c-SAT for small values of k. Unfortunately, no quantitative analysis 
of this algorithm is currently known (not even a non-rigorous one). The basic idea behind SPD is to approx- 
imate the mai^ginals of the SP distribution (i.e., the probability that a given variable is 'true' in a solution 
drawn from the SP distribution) via a message passing heuristic. Then a variable x is selected according to 
some rule and is assigned a value based on the (approximate) marginal. The entire procedure is repeated on 
the "decimated" problem instance where x has been eliminated, until (hopefully) a solution is found. 

The decoiTclation of random solutions chosen from the SP distribution is a crucial assumption behind the 
message passing computation of the SP marginals. Corollary 12. II establishes such a decon^elation property 
rigorously. However, in order to actually analyze SPD, one would have to generalize Corollary 12. II to the 
situation of a "decimated" random formula in which a number of variables have already been eliminated by 
previous steps of the algorithm. Still, we believe that the techniques developed in this paper are a (necessary) 
first step towards a rigorous analysis of SPD. 

4 Heavy solutions and the first moment 

In the rest of the paper we sketch the SP -inspired second moment method on which the proof of Theorem 17.71 
is based. Aiming for an asymptotic result, we may assume that k > kofor some (large) constant k^ > 3. 
We also assume r = 2^~^ In 2 — pfor some ^ ln2 < p < ln2. Let #j denote the ith clause of the random 
formula # so that ^ = #i A • • • A #„i. Furthermore, let signify the jth literal of clause thus, 
#j = V • • • V ^ik- For a literal i we let \i\ denote the underlying variable. 

As we discussed earlier, the demise of the "standard" second moment method in the condensation phase 
is due to the dominance of few large clusters. The statistical mechanics prescription for circumventing this 
issue is to work with a non-uniform distribution over solutions that favors "small" clusters. To implement 
this strategy, we are going to exhibit a simple parameter that governs the size of the cluster that a solution 
belongs to. Formally, we define the cluster of a € as 

C(cr) = C#(cr) = {r G cS(^) : dist(o-,T) < O.Oln} . 

This definition is vindicated by the following observation from |[T2l . which shows that any two solutions 
either have the same cluster or are well-separated. 

Proposition 4.1. Suppose that 2^"^ In 2 — In 2 < r < r^.NAE- W.h.p. any two a,T ^ either satisfy 

dist{a,T) < O.Oln ordist{a,T) > - 2^''^^)n. 

To proceed, we need to get an idea of the "shape" of the clusters C{a). According to the SP formalism, 
each cluster has a set TI{(t) of i7(n) rigid variables on which all assignments in C{a) coincide, while the 
values of the non-rigid variables vary. Formally, we have t{x) = a{x) for all x G TZicr) and all r € C{a), 
while for each x TI{(t) there is r G C{a) such that t(x) 7^ a{x). This implies an immediate bound on the 
size of C{a), namely \C{a)\ < 2"~l^('^)l . Indeed, we are going to prove that every cluster has a rigid set of 
size I7(n) w.h.p., and that for all clusters w.h.p. 

log2 \C{a)\ = (1 - Ok{l)){n - \n{a)\). (4.1) 
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With \C{a) \ controlled by the number of rigid variables, it might seem promising to perform first/second 
moment arguments for the number of solutions with a suitably chosen number of rigid variables. The prob- 
lem with this is that there is no simple way to tell whether a given variable is rigid: deciding this is NP-hard 
in the worst case. Intuitively, this is because rigidity emerges from the "global" interplay of variables and 
clauses. In effect, parametrizing by the number of rigid variables appears technically infeasible. 

Instead, we are going to work with a simple "local" parameter that turns out to be a good substitute. 
Suppose that x € TZ{a). Then x must occur in some clause $j that would be violated if x was assigned the 
opposite value 1 — cj(x) (with all other variables unchanged). By the definition of fc-NAESAT, this means 
that the other k — 1 literals of #i take the opposite value of the literal whose underlying variable x is. In 
this case we say that x supports $j under a, and we call a critical clause. Moreover, we call a variable 
that supports a clause blocked, while all other variables we. free. While every rigid variable is blocked, the 
converse is not generally true. Nonetheless, we will see that the number of variables that are blocked but not 
rigid is small enough so that we can control the cluster sizes in terms of blocked variables. 

As a first step, we are going to estimate the expected number of solutions with a given number of 
blocked variables. Let A = 2fc-^r_i = /cln 2 + Ofc(fc/2^) and let us say that a € S{^) is jS-heavy if exactly 
(1 — (5) exp(— A)n variables are free. Let Sp{^) be the set of all /3-heavy solutions and let Zp = \Sp{^)\ 
denote their number. 

Proposition 4.2. For any (3 < Iwe have 

E [Zp] = exp (2p - ln(2) - (1 - /3) ln(l - /3) - /3 + Ofc(A; • 2"'=))] . (4.2) 
In particular, Z^ = for all (3 < —3/2 w.h.p. 

Proof. The computation of E [Z^] is instructive because it hinges upon the solution of an occupancy problem 
that will play an important role in the second moment computation. Let 1 denote the assignment that sets all 
variables to true. By the linearity of expectation and by symmetry, we have 

E [Z^] = ^ P [cr is a /3-heavy solution] = 2" • P [1 is a /3-heavy solution] 
(Te{o,i}" 

= 2" • P [1 is /3-heavy 1 1 is a solution] • P [1 is a solution] . 

Clearly, 1 is a solution iff each clause of ^ contains both a positive and a negative literal. A random clause 
has this property with probability 1 — 2^"'^. Since the m ~ rn clauses are chosen independently, we get 

2" • P [1 is a solution] = 2"(1 - 2^^'=)"^ = exp {2p - ln2 + Ok{2~^))] . 

Working out the conditional probability that 1 is /3-heavy is not so straightforward. Whether 1 is /3- 
heavy depends only on the critical clauses of ^. Let X be their number. Given that 1 is a solution, each 
clause is critical with probability k/{2^~^ — 1) independently (as there are 2k ways to choose the literal 
signs to obtain a critical clause). Hence, X has a binomial distribution Bin(m, k/{2^~^ — 1)) with mean 

E[X|lG5(i/)] = ^^;5^ = An. 

Since the supporting variable of each critical clause is uniformly distributed, given 1 G S{H) the expected 
number of clauses that each variable supports equals A. Thinking of the variables as bins and of the critical 
clauses as balls, standard results on the occupancy problem show that the number of free variables is (1 + 
o(l)) exp(— A)n w.h.p. Thus, E [Zp] is maximized for /3 = 0. 

By contrast, values /3 / con^espond to atypical outcomes of the occupancy problem. Values /3 < 
require an excess number of "empty bins", while /3 > means that fewer bins than expected are empty. To 
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determine the precise (exponentially small) probability of getting (1 — /3) exp(— A)n empty bins, we need 
to balance large deviations of X against the probability that exactly (1 — /3) exp(— A)n bins remain empty 
for a given value of X. The result of this combined large deviations analysis is the expression (I4.2I ). The 
analysis also shows that E [Zf^] = exp(— i7(n)) for /3 < —3/2, whence = w.h.p. for /? < —3/2. □ 

As a next step, we need to estimate the cluster size of a /3-heavy solution. 

Proposition 4.3. W.kp. for all —3/2 < (5 <1 all (i-heavy a G S{^) satisfy 

71 

log2|C(a)| = ^[l-/3 + Ofc(l)]. (4.3) 

Proof. The crucial thing to show is that all but a very few blocked variables are rigid. The proof of this 
builds upon arguments developed in HI to establish rigidity. Suppose that x is blocked in cr e i.e., 
X supports some clause, say In any solution r with r(x) / a{x) there must be another variable x' 
that occurs in #i such that t{x') / a{x'). Given that x supports ^i, the other k — 1 variables of #i are 
uniformly distributed. Since a has no more than (1 — /5) exp(— A)n = (1 — /3 + Ofc(l))2~'^n free variables, 
the probability that x' is free is bounded by (1 - /3 + Ofc(l))(fe - l)/2*^. In fact, since the expected number of 
clauses that each variable supports is A = (1 + Ok{l))k In 2, it is quite likely that x' supports several clauses 
and that therefore "flipping" x' necessitates several further flips. Continuing this argument, we see that the 
number of flips follows a branching process with (initial) successor rate A. A detailed analysis shows that 
for all but Ofc(A;4~^)?i blocked initial variables x this process will lead to an avalanche of more than O.Oln 
flips, whence r C{a). This shows that all but Ofc(2~^)n blocked variables are rigid. □ 

We are ready to prove that t^^nae < 2'^^^ In 2 — + \) + Ofc(l), which is (almost) the upper 
bound promised in Theorem 11.11 (Some additional technical work is needed to replace the Ok{^) by an 
error term that decays exponentially.) Let Np = \{C{a) : a G S{^) is /3-heavy} | be the number of clusters 
centered around /3-heavy solutions. By Proposition |431 each such cluster has size \C{a)\ = 2"(^^^"'"°''(^))/^* 
w.h.p. Hence, once more by Proposition 14.31 any solution r G C{a) is /3'-heavy for some /?' satisfying 

— PI < Sk = Ok{l) w.h.p. Letting be the total number of /3'-heavy solutions with |/3' — /3| < we 
conclude that 

Nfs ■ 2"(i"/5+°*(i))/2' < Z} w.h.p. (4.4) 

Clearly, < E[Z^] • exp(o(n)) w.h.p. by Markov's inequality. Furthermore, as the total number of free 
variables in each cluster is an integer between and n, we have E[Z^] < (n + 1) • max^/ E[Z^/]. Combining 
these inequalities with the estimate of E[Zf^i] from Proposition 14.21 we find 

Z; < exp [o(n)] E[Z}] < exp [2p - ln(2) - (1 - /3) ln(l - /3) - /3 + Ofc(l)]) w.h.p. (4.5) 

Combining (14.41 ) and ( 14.51 ). we obtain 

Fact 4.4. W.h.p. we have Np < exp [r/(/3) • n/2^]forall (i, with 

rjiP) =2p- ln(2) - (1 - /3) ln(2 - 2/3) - /? + Ofc(l). (4.6) 

Finally, it is a mere exercise in calculus to verify that at density r* = 2^^'"^ In 2 — + 3) + Ofc(l) the 
exponent rj{P) is negative for all /3. Therefore, Fact 113] implies that r* is an upper bound on r^-^AE- 

Remark 5. The exponent ri{P) attains its maximum at /? = ^ + Ofc(l). Together with our second moment 
bound below, this implies that for /? = ^ + Ok{l) we have N{^) = exp(ofc(l)ri) • A'^(^) w.h.p., i.e., setting 
/3 = i + Ofc(l) corresponds to the uniform distribution over clusters and thus to the SP distribution. 
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5 The second moment 



A first attempt. The obvious approach to proving a matching lower bound on r^^NAE seems to be a second 
moment argument for the number of /3-heavy solutions, for some suitable p. There is a subtle issue with 
this, but exploring it will put us on the right track. 

We already computed E [Z^] in Proposition 14.21 As E[Z|] is the expected number of pairs of /3-heavy 
solutions, the symmetry properties of the random formula ^ imply that 

E[Z^] = E [Z^] • E [Z^|fT G for any fixed a G {0, 1}" . 

Thus, the second moment condition ( 12.11 ) that we would like to establish for Y = Zjs becomes 

E[Z(s\aeSpm<C-E[Z^]. (5.1) 

What value of f3 should we go for? By Fact 14.41 a necessary condition for the existence of /3-heavy 
solutions is that the exponent r/(/3) from (14.61 ) is positive. Let us call f] feasible for a density r if it is. An 
elementary calculation shows that for r > rcond = 2'^^^ ln2 — In 2 + 0^(1), any feasible /3 is strictly positive. 

However, (15.11 ) turns out to be false for any /3 > 0, for any density r > 0. To understand why, let 
us define the degree of a variable x € y as the number of times that x occurs in the formula Let 
d = {dx)x&v be the degree sequence of It is well known that in the "plain" random formula ^ (without 
conditioning on o" G the degree of each variable is asymptotically Poisson with mean km/n. On the 

other hand, if we condition on o" G 5^3 (^) for some /3 > 0, then the degrees are not asymptotically Poisson 
anymore. Indeed, the degree dx is the sum of the number of clauses that x supports, and the number d'^ of 
times that x appears otherwise. While d'^ is asymptotically Poisson with mean < km/n as the non-critical 
clauses do not affect the number of blocked variables at all, Sx is not. More precisely, we saw in the proof 
of Proposition I4.2l that for /3 > 0, s^^ is the number of "balls" that x receives in an atypical outcome of the 
occupancy problem. The precise distribution of Sx is quite non-trivial, but it is not difficult to verify that Sx 
does not have a Poisson distribution. Fleshing this observation out leads to the sobering 

Lemma 5.1. For any /3 > and any r > we have E[Zi3\a G > exp(i7(n)) • E [Zj^]. 

In summary, conditioning on u G (#) with /3 > imposes a skewed degree distribution that in turn 
boosts the expected number of /3-heavy solutions beyond the unconditional expectation. 

Making things work. We tackle the issue of degree fluctuations by separating the choice of the degree 
sequence from the choice of the actual formula. More precisely, for a sequence d = {dx)xev of non-negative 
integers such that "^x&v ~ denote a /c-CNF with degree sequence d chosen uniformly at 

random amongst all such formulas. Fixing a "typical" degree sequence d, we are going to perform a second 
moment argument for thereby preventing fluctuations of the degrees. 

How do we define "typical"? Ideally, we would like d to enjoy all the properties that the degree sequence 
of the (unconditioned) random formula # is likely to have. Formally, we let D = Dk{n, m) be the distribu- 
tion of the degree sequence of What we are going to show is that our second moment argument succeeds 
for a random degree sequence chosen from the distribution D w.h.p. 

Definition 2. A j3-heavy solution a G is good if the following conditions are satisfied. 

• Wehave\C{a)\ <E[Zp{^d)l 

• There does not exist r G with O.Oln < dist{a, r) < - 2^^'/^)n. 

• No variable supports more than 3k clauses under a. 
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The first two items miiTor our analysis of the solution space from Section |4] The third one turns out to be 
useful for a purely technical reason. 

Let Sgji{^d) be the set of good /3-heavy solutions and set Zg^p{^d) = \Sg^i3{^d)\ - We perform a second 
moment argument for with d chosen randomly from the distribution D. The result is 

Proposition 5.3. Suppose that /3 > is feasible. There is C = C{k) such that for a degree sequence d 
chosen from the distribution D w.h.p. E < C • E [Zg^/^^^d)]^ ■ 

Proposition 15.31 shows that the second moment method for Zg^p{^d) succeeds for feasible /3. As we 
observed in SectionlH a feasible /3 > exists so long as r < 2^^^ In 2 - + i) - Ok{k'^/2''). Hence, 
Proposition 15 . 3 1 and the Paley-Zygmund inequality show that ^d is NAE-satisfiable for all such r with a non- 
vanishing probability for d chosen randomly from D. Consequently, the same is true of the unconditioned 
formula $ (because we could generate # by first choosing d from D and then generating Since the 
A;-NAESAT threshold is sharp ll22l . we obtain the lower bound in Theorem ll.il 

Proving Proposition \5.3\ As a first step, we need to work out E Suppose /? > is feasible. 

Recall that p is such that r = 2^~^ \n2 — p. 

Lemma 5.4. W.h.p. the degree sequence d chosen from D is such that 

E[Z3,^(^rf)] ~E[Z^(*d)] =exp (2p-In2-(l-/3)ln(l-/3)-/3 + Ofc(A:/2*^))] . 

Proof. Choose and fix a degree sequence d. We need to compute the probability that some a G {0, 1}^ is 
a good /3-heavy solution. By symmetry, we may assume that o" = 1 is the all-true assignment. Then a is 
a solution iff every clause contains both a positive and a negative literal. Since the signs of the literals are 
chosen for all m clauses independently, we see that 

P[aeS{^d)] = {l-2^-T. (5.2) 

Given that a is a solution, the number X of critical clauses has distribution Bin(m, k/{2^~^ — 1)), because 
whether a clause is critical depends on its signs only. As in the proof of Proposition 14.21 to determine the 
probability that a is /3-heavy we need to solve an occupancy problem: X balls representing the critical 
clauses are tossed randomly into n bins representing the variables. However, this time the bins have ca- 
pacities: the bin representing x ^ V can hold no more than min{3A;, dx} balls in total. Thus, we need to 
compute the probability that under these constraints, exactly (1 — /3)2~'^n bins ai^e empty. This amounts to 
a rather non-trivial counting problem, but for a random degree sequence d the probability differs from the 
formula obtained in Proposition l4.2l onlv by an error term that decays exponentially in k. More precisely, 

P[aGcS^(#d)kG5(*d)] =exp(-f [(1 - /3) ln(l - /3) - /3 - Ofe(A:/2'=)] ) . (5.3) 

Let us provide some intuition why this is. The bin capacities are such that w.h.p. most bins can hold about 
kr = k2^-^ In 2+Ok{k) balls. By comparison, the total number of balls is X mk/{2^ ^ — 1) ~fc nk\n2 
w.h.p. In effect, the expected number of balls that a typical bin receives is about k In 2, way smaller than the 
capacity of that bin. Indeed, since the number of balls that are received by a typical bin is approximately 
Bin(A;r, "'^.^ ^ ) Bin(fcr, 2"'^+^), the number of balls can be approximated well by a Po(A) distribution 
(with A = kr/{2^~^ — 1) ~fc A; In 2). Thus, the probability that a bin remains empty is close to exp(— A), 
which was the probability of the same event in the experiment without capacities. The technical details of 
this argument are quite delicate, as the fluctuations of the capacities need to be controlled very carefully. 

Finally, similar arguments as in the proof of Proposition 14.31 yield P [cr G Sg^ii{^d)W ^ Sp{^d)] = 
1 — o(l). Thus, the assertion follows from (I5.2l) - (l5.3b . □ 
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We now turn to the second moment. Fix some a G {0, 1}^, say a = 1. Let Zg^i3{t,a) denote the 
number of good r G S{^d) at distance t from a. Using the linearity of expectation and recalling that the set 
of NAE-solutions is symmetric with respect to inversion, we obtain 

E[Zg,^(^d)kG5g,^(#d)] <2 E[Zg,^(t,c7)|a G5g,^(*d)]. (5.4) 

G<t<n/2 

Let / = {t G Z : (i - 2-^1^) n<t<n/2}. The first two conditions from Definition |2] ensure that given 
that a is good, with certainty we have 

Zg^p{t,a) < \C{a)\ < E[Z^(#d)] and ^ Zg^p{t,a) = 0. 

*<0.01n 0.01n<t<(|-2-*/3)n 

Hence, Lemma [S!4l and (15.41) yield 

E [Zg^p{^d)\(y G Sg^p{^d)] < (2 + o(l))E [Zg,p{^d)] + 2 E [Zg^p{t, c7)|(j G Sg^pi^d)] ■ (5.5) 

This reduces the proof to the analysis of the "central terms" with t G I. The result of this is 
Lemma 5.5. There is a constant C = C'{k) > 1 such that for a random d we have 

T.t^i'£'[Zg,p{t,a)\a G Sp^gi^d)] < C • E [Zg,p{^d)] w.h.p. (5.6) 

Proof ( sketch ). This is technically the most challenging bit of this work. The argument boils down to esti- 
mating the probability that two random cr, r G {0, 1}" with dist(cr, r)/n = q G [| — 2"''"/^, ^] simulta- 
neously are good /3-heavy solutions. To compute this probability, we need to analyze the interplay of two 
occupancy problems as in the proof of Lemma l5.4l with respect to the same degree sequence d. 

More precisely, let B = [j^^y {x} x {1, . . . , dx} be a set of km "balls". Generating ^d is equivalent 
to drawing a random bijection tt : [m] x [k] — )• B, with 7r(i, j) = (x, /) indicating that x is the underlying 
variable of the jth literal of clause i, and independently choosing a map s : [m] x [k] — )• {±1} indicating the 
signs. Further, we represent the occupancy problems for <t, t by two "colorings" g^^, gr : B ^ {^^ed, blue}, 
with ga{x, I) = red indicating that the Ith position in bin x is occupied under a (and analogously for r). 
We compute the probability p{a, g^^, gr) that tt, s induce a formula in which 

• literal supports clause i under a iff go- o iT{i,j) = red, and similarly for r. 

• both (T, T are good /3-heavy solutions. 

The result is that for any g^, gr the "success probability" is minimized at a = 1/2. Quantitatively, 

Ofc(A;V2'=)(a - l/2)^n\ for any g^,gr. (5.7) 



p{a,g^,gr) 

— exp 



P(l/2,5a,S'T 

On the other hand, the total number of assignment pairs satisfies 

|{(a,r):dist(.,r)=an}| _ ^n^ / M = exp(-(4 - o,(l))(a - 1/2)^, (5-8) 



|{(fj, r) :dist(fj, r) =n/2}| \anj' \n/2 

which is maximized at a = 1/2. Combining (15.71 ) and (15.81 ). we see that for any two colorings ga,gT 
the dominant contribution to the second moment stems from a = ^ + 0{l/^/n), i.e., from "perfectly 
decorrelated" a, t. The assertion follows by evaluating the contribution of such a explicitly and summing 
ower g^,gr. □ 

Acknowledgment. The first author thanks Dimitris Achlioptas and Lenka Zdeborova for helpful discussions 
on the second moment method and the statistical mechanics work on random CSPs. 
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Appendix 

This appendix contains the details omitted from the extended abstract. Section |A]contains some preliminary 
facts about random variables that will be used many times. Appendix IB] contains the full proof of the upper 
bound claimed in Theorem 11.11 (with e^. exponentially small in k). Finally, in Appendices |C] and ID] we carry 
out the second moment argument in full. 



A Preliminaries 

The next lemma provides an asymptotically tight bound for the probability that a sum of independent and 
identically distributed random variables attains a specific value. It will be an important tool in our further 
analysis, since we will be often interested in the exact probabilities of rare events. 

Lemma A.l. Let Xi, . . . , X„ be independent random variables with support on Nq with probability gen- 
erating function P{z). Let = E[Xi] and = Yar[Xi]. Assume that P{z) is an entire and aperiodic 

zP'(z) 

function. Then, uniformly for all Tq < a < T^o, where = limz^x p{z) > as n ^ oo 

Pr[Xi + . . . + X„ = an] = (1 + o(l)) " , (A.l) 



where C, and are the solutions to the equations 

(A.2) 



C-P'(C) 

— a and ^ = — (In ^(z) — q In z) 



Moreover, there is a 5q > {) such that for all < |5| < 5q the following holds. If a = E[Xi] + 5(t, then 

Pr[Xi + . . . + X„ = an] = (1 + 0{5)) ^ ^ ^{^s^ /2+o{5^))n _ ^^^^ 

V 27rn(T 

Proof. The first statement follows immediately from Theorem VIII. 8 and the remark after Example VIII. 1 1 
in |[20l . To see the second statement let us write Qs for the solution to the equation ''pj^^'^^ = M + '5<^- Since 
P(l) = 1 and P'{1) = /^i we infer that if 5 = 0, then (,§ = 1. Moreover, a Taylor series expansion around 
z = 1 guarantees for all (5 in a bounded interval around that 

^ + 5<^ = ^^ = ^ + (a-i) + o((a-i) ). 

Since a'^ = P"(l) + - P'{Vf, for all 5'm& bounded interval around we have that = 1 + 5 /a + 
0{5'^). In order to show (IA.3I ) we evaluate the right-hand side of dA.ll ) at C = Cs- Again a Taylor series 
expansion around z = 1 guai^antees that 



i2 



P(l) + (0 - 1)(P'(1) - aP{l)) + ^"'^ ^ ' [P"{1) + P{l)a^ + P(l)a - 2P'(l)a) + 0{5') 



2cj2 



^""^^^'^ + ^iP"{l) + M - + 0(6)) + 0{6') 



The exponential term in (IA.3I ) is then obtained by using the fact I — x = e~^~®(^^). Finally, note that 

d'^ n . . ^ P"{z) P'{zf a 

^(lnP(.)-aln.) = ^-^ + ^. 
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By applying again Taylor's Theorem to this function we obtain after some elementary algebra (details omit- 
ted) that the value of this function at ( = (s equals a + 0{5), and the proof of (IA.3I) is completed. □ 

The next statement provides tight asymptotic bounds for binomial coefficients. 

Proposition A.2. Let < a < 1/2 and —l/2<e<l/2be such that < a + e < 1. Then, as N ^ oo 



N 



1 + 0(1) 



:e^(")^ and 



N 



^ +"(^) JH{a)+e\og{i^)+0{eya))N 



aNj ^2tt f{a)N " """ \{a + e)Nj y^2Trf{a + e)N 

where H{x) = —xlnx — (1 — x) ln(l — x) denotes the entropy function and f{x) = x{l — x). 



Proof. The first statement is well-known, see e.g. 11201 . To see the second statement, note first that that 
H'{x) = In(i^) and H"{x) = {x{x — both valid in (0, 1). Then, Taylor's Theorem guarantees that 

H{a + e) = H{a) + eH' {a) + 0{e^/a), 

from which the second statement follows immediately. □ 



B The upper bound on r-fc-NAE 

To prove the upper bound on t^.^ae we are going to combine the upper bound on the expectation of 
from Proposition 14.21 with a lower bound on the cluster sizes of /3-heavy assignments, see Lemma IB31 Let 
A = kr/{2^~^ - 1). First of all, we fill the missing pieces in the proof of Proposition 14.21 The next lemma 
provides the analysis for the balls-into-bins game that was omitted in the proof of Proposition [ 



Lemma B.l. Let X ~ Bin(m, k/{2^ ^ — 1)). We throw X balls into n bins uniformly at random. Let Bi 
denote the number of bins that receive i balls. Then, for any — 3/2 < /3 < 1 



n ^ In Pr 



Bn 



{l-/3)e-^n 



n ^ In Pr 



Bin(n,e'^) = (1 -/3)e~^n +Ofc(H 



(B.l) 



Proof. We shall estimate the desired probability by conditioning on any specific value x of X. Let F-i be 
the number of balls in the f th bin, and let Pi , . . . , P„ be independent Poisson distributed random variables 
with mean A. It is well-known and easy to verify that the distribution of (Fi, . . . , F„) is the same as the 
distribution of (Pi, . . . , P„), conditioned on the event A{x) = " X]i<j<„ Pi = x". So, if we denote by A'^o 
the number of Pj's that are equal to 0, we infer that 



Pr 



Po = (1 - /3)e-^n I X 



Pr 



iVo = (1 - /3)e-^n A{x) 



By the law of total probability this equals 



Pr 



Pn 



(1 - /3)e-^n I X 



Pr 



An 



(1 - /3)e-^n 



Pr[^(a;) | Aq = (1 - /3)e-^n] 
Pr[^(a;)] 



Note that Aq ~ Bin(?i, e '^). Furthermore, if we denote by P{, . . . , P^^, where ^ = 1 — (1 — (3)e ^, 
independent Poisson variables that are conditioned on being at least 1, then the above equation implies that 



Pr [Po = (1 - /3)e- 



n\ 



Pr [Bin(n, e^^) = (1 - I3)e~^n] Pr[Po(An) = x] 



Pr 



Bin(rri,A:/(2 



fc-i 



l)) = x 



. (B.2) 
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In order to complete the proof of (IB.ll) we will derive in the sequel appropriate bounds for the right-hand 
side of the above equation. First, to obtain a lower bound, note that < A, since ^ < 1 and A = fcln 2 + 
Ok{k2~^), which is > 1 for sufficiently large k. Thus, we can obtain a lower bound for (IB.2I) by considering 
only the term in the sum that corresponds to x = An. Since E[Po(An)] = E[Bin(rn, k/{2^^^ — 1))] = An, 
we infer by applying Lemma IATT] that 

Pr[Po(An) = An] = ©(n'^/^) Pr[Bin(rn, k/{2^-^ - 1)) = An] = ©(n'^/^)^ 

It remains to bound Pr[X;f=i P'l = An]. Note that E[P{] = Yzf^- If we write N = in, then 



Pr 



in 
.i=l 



An 



Pr 



N 



li=l 



E[Xi] + 



N 



i.e., we require that the sum of the P/'s deviates from the expected value by Ok{k2~^n). By applying 
Lemma lA.ll where we set 5 = Ok{k^^'^2~^), we conclude that the right-hand side of (IB. 21 ) is at least 
e-x.-p{—Ok{k4r^n)}. This shows the lower bound in (IB.ll) . 

In the remainder of this proof we will show an upper bound for the right-hand side of (IB.2I) . To this 
end, we will argue that the ratio Pr[Bin(rn, A;/(2''"~^ — 1)) = 7An]/ Pr[Po(An) = 7An] is essentially 
bounded for all x in the given range, from which the claim immediately follows. More specifically, let us 
write X = 7 An, where ^/A < 7 < r/A. By applying Stirling's Formula A^! = (1 + o{l))^J2■KN{N/e)^ 
we infer that 

Pr[Po(An) =7An] =0(l)n~i/2 exp{An(-l + 7 - 7ln7)}. (B.3) 
Moreover, by abbreviating p = k/{2^~'^ 



1) we get 

Pr[Bin(rn,A;/(2*=~i - 1)) = -fXn] = 



rn 



Since (^^) < e^*^") ^ , where H denotes the entropy function, we obtain after some elementary algebra 

1 ~ TP 1 f ^ ~ IP 



Pr[Bin(rn,p) = 7An] < exp <^Xn [ —7 In 7 

By combining this with (IB. 31) we obtain the estimate 

Pr[Bin(rn, k/{2''-'^ - 1)) = 7An 



■In 



P 



1 — p 



Pr[Po(An) = 7An] 



/(7) An 



where /{j) = 1 — 7 - 



1 — 7P / 1 — 7P 



In 



P 



1-p 



Recall that < ^/A < 7 < r/A = 1/p, and note that both /(O) and /(1/p) are < 0. Moreover, / has an 
extremal point at 7 = 1, where /(I) = 0. Thus, for all 7 in the considered range we have that 7(7) < 0, 
which implies that the right-hand side of (IB. 21 ) is bounded from above by at most a polynomial in n. This 
completes the proof of the lemma. □ 

The proof of Proposition I4.2l then completes by applying the following statement. 

Lemma B.2. There is a ko > 3 such that the following is true. Let Y ~ Bin(n, e^^). For any —3/2 < P < 



n ^ In Pr 



Y=[{l-P)e-^n\ =/(/3) + Ofc(4-*^) 
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Proof. Let us abbreviate ^ = (1 — /3)e~'^. We will assume that = [^nj, i.e., that P = I — N{e~^n)~^ 
for some N S Nq. To see that this is sufficient, note that by Taylor's Theorem, for any /3 > 1 and any 
kn| < {e~^n)~^ such that /3 + < 1 there is a 5 G /3 + e„] such that 

+ ^n) = + enf'{6) = /(/3) + e„e-^ ln(l - 5) = /(/?) + 0^(4-^=). 

With the above assumption we proceed with the proof of the claim. The definition of the binomial distribu- 
tion implies 

Pr[y = (1 - /3)e-^n] = f \ e-^«"(l - e-^)^^-^)". (B.4) 



If /3 = 1, then = the above expression simplifies to 

(1 - e^^)" = exp{nln(l - e^^)} = exp{n(-e^^ - ©(e-^^))}. 

Since /(I) = and A = /c In 2 + 0{k2^^), we infer that the statement is true for /3 = 1. It remains to 
treat the case /3 < 1. Standard bounds for the binomial coefficients imply 



n 



e"-^^^\ where ^/'(a;) = -xlnx - (1 - x) ln(l 



Using the estimate ln(l — x) = —x — ©{x"^), which is valid for \x\ < 1, we infer after some elementary 
algebra that 



n^i In ( = e"\il - /3)A - (1 - /3) log(l - /3) + (1 - + 0,(4-'=) (B.5) 



Similarly, the second and the third term in (IB. 41 ) can be estimated with 

ri"i In (e"^«"(l - e"^)^^-^)") = -e^H{l - /3)A + 1) + 0,(4"'=). 

By plugging this fact together with (IB. 51 ) into (IB.4I ) we finally obtain the desired statement. □ 

We proceed with the proof of the upper bound in Theorem 11.11 Let Zj^^^ denote the number of /3-heavy 
solutions a such that ^ log2 \C{(t)\ < (1 — /3 — 7)6^^. The following statement provides an upper bound 
for the expected number of such solutions. 

Lemma B.3. For any —3/2 < /3 < 1 and 7 > k^^'^e~^ we have for sufficiently large k 

-InE [Z^,^] < i In E [Z^] - ^7^"^ 
n n 

Proof. Let a G {0,l}^bean assignment; for the sake of concreteness, assume that a = 1. In order to 
bound E it is sufficient to estimate the probability of the event 

f = |-log2|C(a)|<(l-/3-7)e-^ 
1^ n 

given that a G Sp{^). Let F{(t) denote the set of free variables, and denote by X be the set of clauses that 
do not contain both a positive and a negative literal whose underlying variable is in y \ F{a). Then only 
the clauses in X impose constraints on the free variables. We decompose X into k — 1 subsets X2, ■ ■ ■ , X^, 
where Xi the set of all clauses in X that contain i variables from T{a). Note that X = ujL2'Vj, as any clause 
with only one variable from J-'{a) necessarily contains both positive and negative literals whose underlying 
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valuables are not free. Let Xi = \Xi\. Since only the clauses in X impose constraints on variables from 
F{a) that occur in them, we infer that 

k 

-log2 \C{a)\ > \T{a)\ - Y, where Y = y^iXi. 

71 ^ ^ 



^=2 



In the remainder we will show that 

Y > 7e^^n | a £ Sf^i^) 



— InPr 

n 



Ink X 

< -^7e^\ (B.6) 



from which the statement in the lemma follows immediately. 

Note that the set T{(t) is determined by the critical clauses only. Therefore, given that a G 5/3(#), 
the variables that occur in the non-critical clauses are independent and uniformly distributed over the set of 
all variables. Similarly, given that a G 5/3 (#) the k — 1 variables that contributed the "majority value" to 
each critical clause are independently uniformly distributed. Therefore, Xi is stochastically dominated by a 
binomial random variable 

XI ~ Bm{m,pi), where pi = 2'^+'^ ■ 2^ ((1 - /3) exp(-A))*. 

Our assumption —3/2 < /3 < 1 guarantees that (1 — /3)e^'^ < 3e^'^ < 3 • 2~^. By using the estimate 
(i) < we infer that 

Pi < 2-^^+1 • 2' C") ((1 - P)e-^y < 2-'=+i {6k2-''y. (B.7) 



Moreover, note that the Xi are negatively correlated. Indeed, let Xij be the indicator for the event that 
the clause G X^. Then, for all i i' we have E[Xj jXj/j] = < E[Xjj]E[Xj/j], and otherwise, if 
7^ (^')/). then Xij and Xiiji are independent. Thus, for any 6 > 0, Markov's inequality impUes with 



t = 76"^ 



Fr[Y>t\a£ Sp{^)] < e~^' JJ E[e^'^'] < e^^' JJ E[e^'^^] < e~^* HiPiC^' + I - Pi 

1=2 i=2 i=2 

Let us fix (5 = ^ In k. By the arithmetic-geometric mean inequality we obtain that the expression in the 
previous equation is at most 

km , , / , , -, I , • r ■ \ km 



Since t = je~^ > k^/'^4r^ , for sufficiently lai^ge k we get (IB.6I ). and the proof is completed. □ 
Consider the function 

=M/5)-(l-/3)e"Mn2, 

where 

= + and /(/3) = -((1 - /3) ln(l - /3) + /3)e-\ 

Let r* be the least density r such that g{l5) < — A;^4~'^+^ for all /3 > —1. Since g is maximized for /3 = 1/2, 

~ 3- In 2 _ 1 
2k 2* 



where S'(l/2) = J"^ - ^e"^, it is easily verified that 



2^-1 ln2 - ('— + - ) +Ok{k^2-'^). 
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Proposition B.4. With r = r* the random formula # does not have a NAE-solution w.h.p. 

Proof. Let Z</3 be the number of solutions that are /3' -heavy for some /3' < /?. In order to prove that 
Z<i3 = w.h.p. for all (3 we proceed as follows. Let —3/2 = /3o < • • • < /?£ = 1 be a sequence such that 
|/3j — A+il < S for all i, where 6 = 2^^*^. We are going to show inductively that ^</3- = w.h.p.; by the 
previous discussion we may assume that this is true for i = 0. 

Let us assume for the induction step that i is such that w.h.p. Z^js- = 0. Let 70 = fc^e"'*', and let Z' be 
the number of solutions that ai^e /3'-heavy for some /?' > /3j and such that ^ log2 \C{a)\ > {I — — jo)e~^. 
Then, by applying Proposition 14.21 and using that h{x) is monotone increasing for x < and monotone 
decreasing for x > we obtain 

- lnE[Z'] < max/i(/3) + Ok{5 + k4~') = OM-') + I ^ 0' . 

Let us first consider the case /3j < 0. The choice of r* guarantees that 51(0) = h{0) — In 2 < —k^A~^~^^. 
Since Z' > Q implies Z' > exp{n(l — — 7o)e~^ In 2} > exp{n(l — 7o)e^^ In 2} or otherwise Z<[s^ > 
we infer for sufficiently large k that 

Pr[Z' > 0] < Pr[Z</3^ > 0] +E[Z']exp{-n(l -7o)e-^ln2} = o(l). 

On the other hand, if /3j > 0, then again the choice of r* is such that g{l3i) = h{/3i) — (1 — Pi)e~^ hi 2 < 
_^3^-fe+i jj^yg^ fQj- sufficiently large k 

ilnE[Z'l < -A;34-'=+i + (1 -/3i)e-^ln2 + Ofc(M-*^) < -fc^4-*^' + (1 - ft - 70)6"^ In 2. 
n 

So, since Z' > implies Z' > exp{n(l — (3i — 7o)e~^ In 2} or otherwise ^</3, > we infer that 

Pr[Z' > 0] < Pr[Z<ft > 0] + E[Z'] exp{-7i(l - ft - 70)6"^ In 2} = o(l). 

Thus, in both cases we have that Pr[Z' > 0] = o(l). In remains to consider all satisfying assignments such 
that ^ log2 \C{a) \ < (1 — ft — 7o)e~'^. More specifically, let be the number of solutions that are ft-heavy 
for some ft < /?' < ft+i and such that 

(1 - ft - 7i+i)e-^ < ^ log2 \C{a)\ < (1 - ft - ^j)e-\ 
where 'Jj+i = l^yj. Choose /3' be such that Sjji{^) n C{a) is maximized. Then 

|5«/(^)nC((j)| > i^^. (B.8) 
n 

Since Z<ii. = w.h.p., we may assume that /3' > ft. There ai^e two cases to consider. 

Case 1: 1 — ft — 7^+1 > 1 — /3'. We will show that in this case the number of /3'-heavy assignments is larger 
than the expected value by at least an exponential factor. Indeed, our assumption on g implies for sufficiently 
large k that 

^InE [Z^>] = /i(/3') + Okiki"^) < -k^i-'' + (1 - /3')e~^ln2 < -kH~'' + (1 - ft - 7j+i)e^^ln2. 
However, if (IB. 81 ) holds then 

- In Zp, > - In \C{a)\ - o(l) = (1 - ft - 7j+i)e-^ ln2 - o(l). 
n n 
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By Markov's inequality, the probability of this event is exp(— f2(n)). 



Case 2: 1 — /3j — jj+i < 1 — /?'. The assumption guarantees the existence of a 7' > such that 

l-A-7i+i = l-/3'-7'. 
In this case we will show that the number of solutions in SjS'^yi^) is larger than the expected value by at 



least an exponential factor. Equation (IB.8I ) implies that 

-lnZ«/y > iln|C(cj)| -0(1) = (l-/3'-7')e~^ln2-o(l). (B.9) 
n n 

If 7' > k^l'^e~^, then by Lemma IB3] and our assumption on g 

i InE [Z^, y] < h{P') + 0,{kA-^) - i^ye-^ < (1 - 0)e-^ In 2 - ^^l'e-\ 

Thus, by applying (IB .91 ). we infer that Zj^/^y > exp(J7(n))E [Z^/^/] . By Markov's inequality, the proba- 
bility of this event is exp(— I7(n)). On the other hand, if 7' < fc^/^e"^, then for sufficiently large k 

-InE \ZB,y] < h((3') + Ok(k4~'') < -kH-''+^ + (1 - P')e~^ln2 < -k^i"'' + (1 - /3' - -f')e^^ln2. 
n 

Thus, again by applying (IB.9I ). we infer that also in this case Z^',^' > exp(i7(n))E [Z^/ ,^/] , and Markov's 
inequality asserts that the probability of this event is exp(— i7(n)). 



Since the probability that either case occurs is exp(— i7(?i)), we conclude that the same is true of the event 
"Zj > 0". Taking the union bound over j then completes the induction step, i.e., Z<^^^j = w.h.p. □ 

Finally, the upper bound on rfc_NAE claimed in Theorem [TT] follows directly from Proposition IB .41 

C Proof of the lower bound 
C.l Outline 

Let d, D be as in Section [5] In the extended abstract, we presented a slightly streamlined definition of 
"good". Technically it will be more convenient to work with the following definition. (It will emerge later 
that the two definitions are equivalent.) Recall that A = kr/{2^~^ — 1). 

Definition 1. We call a solution a G {0,1}^ of^^ /3-good if it satisfies the following conditions. 

1. a is /3-heavy and the total number of critical clauses is equal to An. 

2. No variable supports more than 3k clauses. 

3. We have 



n 



|r G Si^d) ■■ distia, T)/n < 1 - 2-"/'^ 



< (1 - /3) exp(-A) In 2 + Ok{k^H-'') 



Let Zjs be the number of /3-good solutions. As a first step, we determine the expectation of Zj^. 
Proposition C.2. Suppose that d is chosen from the distribution D. Then w.h.p. 



- lnE[Zp] > ^^-^ + /(/3) - O.ik'H-'^) 
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Let us fix an assignment a S {0, 1}^, say a = 1. Moreover, let S be the event that cr is a /3-good 
solution. Let Zfs{t) be the number of /3-good solutions r G S{^d) such that dist((j, r) = t. Then the 
symmetry properties of imply the following. 



Fact C.3. For any d we have E 



7"^ 



Thus, we need to compare E [Zp\E] with E [Zj^]. Let 5 = 2 ^/'^. By the linearity of expectation and by the 
symmetry of with respect to inversion, for any d 

n n/2 

E[zp\s] = Y,^[Zp{t)\s\<2Y,mp{t)m 

t=0 t=0 

= 2 E[^^(t)|i;] + 2 Yl ^[Mt)\^] 

t<{^-S)n (|-<5)n<t<|n 



< 2 



|r GcS(^d) :dist(a, T)/n< ^-2-'=/3 1 +2 ^ E[^^(t)|r] 



< 2 exp 

by the definition of /3-good. Let 



(i-5)n<i<in 

n((l -/3)exp(-A)ln2-Ofc(rt-'^))] +2 ^ E[Z/3(t)|i;] (C.l) 

(i-5)n<t<in 



2*^"^ In 2 



ln2 1 

— + 4 



A:i^2-^ 



Lemma C.4. For any r < r* there exists < /3 < ^ such that for d chosen from D w.h.p. 

E [Zp] > exp [n((l - (3)e^^ In 2) + k^'^2~''+^ 
Proof. This follows from Proposition |C .21 and a little bit of calculus. 



□ 



As a next step, we are going to bound the second summand in (IC.ll ). This is technically the most de- 
manding part of this work. In Appendix iDl we are going to prove the following. 

Lemma C.5. Let 6 = 2~^/^. There is a number C = C{k) such that for a degree sequence d chosen from 
D we have w.h.p. 

Y E[Zpit)\U]<C-E[Zp]. 

(|-<5)n<t<in 

Corollary C.6. For any r < r* there is < (3 < ^ such that E [Zj^ \U] < (7 • E [Zj^] for some constant 
C = C{k) > 1. 



Proof. This follows directly from (IC.lb . Lemma IC4l and Lemma |C31 



□ 



Proof of Theorem \l.l\ ( lower bound). By Corollary IC.6I and the Paley-Zygmund inequality, for any r < r* 
for a random d chosen from the distribution D we have w.h.p. 



P [#d has an NAE-solution] > P [2:^ > 0] > 1/C. 



(C.2) 
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Since D is precisely the distribution of the degree sequence of the uniformly random formula ^, we have 

Ed [P [^d has an NAE-solution]] = P has an NAE-solution] , 
where the expectation on the left hand side ranges over d chosen from D. Therefore, (IC.2b implies that 

P has an NAE-solution] > - o(l), (C.3) 

which remains bounded away from as n — oo. Hence, ( IC.3I ) implies that r^.NAE ^ ?"*, as the A;-NAESAT 
threshold is shaip. □ 

C.2 Proof of Proposition lOl 

We begin with the following simple observation. 

Lemma C.7. Foranydandanya{<^,l]'^wehaveV[creS{^d)] = (1-2^-'=)'". 

Proof. We may assume without loss that a = 1. Then cr is a solution iff each clause has both a positive 
and a negative literal. Since the signs of the literals are chosen uniformly and independently, the assertion 
follows. □ 

We defer the proof of the following result to Section |C3l 

Proposition C.8. Let dbe a chosen from D. Then w.h.p. we have 

i InP [cr has Properties 1. and 2. from Definition\l]\ a G S{^d)] = + Ok{k4r^). 
n 

To continue, we need the following basic fact about the random degree distribution d. For a set 5 C F we 
letVol(5) = E.65 4- 

Lemma C.9. Let d be chosen from D. Then w.h.p. the following is true. 

For any set S <Z V we have\o\{S) < 10max{fcr|5|, |5| In(n/|5|)} . (C.4) 

Proof. For any fixed S d V the volume Vol(5) = Ylix<^s ^ independent Poisson variables 

Po(A;r). Hence, ¥01(5") = Po(|S'|A;r), and the lemma follows from a straight first moment argument. □ 

Let us call S d V dense if each variable in S supports at least two clauses that each feature another 
variable from S. 

Lemma C.IO. Let d be chosen from D and let a E {0, 1}^. Let A be the event that a G S{^d) o-nd that a 
satisfies conditions 1.-2. in Definition^ Then w.h.p. 

P \there is a dense S dV , \S\ < n/k^ \ A] = o(l). 

Proof. We may assume that d satisfies (IC.4I) . Let 1^(5) be the event that S* C F is dense. We claim that 

^ - (fcrn/2)2|5| - \^ km J 
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Indeed, the factor /c^l"^! accounts for the number of ways to choose the two relevant clauses supported by each 
variable, and the second factor bounds the probability that each of these clauses contains another occurrence 
of a variable from S. Now, (IC.4I) yields 

For < s < let be the number of sets S of size l^l = sn for which occurs. Then 

E[Xs] < (^^^ [2A;2^1n(l/s)]^'" < [J • {2kh / s)^^'' ^ (4efcSln2(l/s))'" = o(l). 

Summing over all possible s and using Markov's inequality completes the proof. □ 

Lemma C.ll. The expected number of solutions a G in which more than k^2~^n variables support 

at most four clauses is < exp{—nk^/2^). 

Proof. Fix an assignment a G {0, 1}^, say a = 1. Then number of clauses supported by each x G F is 
asymptotically Poisson with mean A. Let £x be the event that x supports no more than three clauses. Then 

P [£x] < A3exp(-A) < A:^2^^'"\ 

The events {£x)xev are negatively correlated. Therefore, the total number X of variables x £ V for which 
£x occurs is stochastically dominated by a binomial variable Bin(?z, k^2~''~^). Hence, the assertion follows 
from Chernoff bounds. □ 

Let us call a set 5 C ^ self-contained if each variable in 5 supports at least two clauses that consist of 
variables in S only. There is a simple process that yields a (possibly empty) self-contained set S. 

• For each variable x that supports at least one clause, choose such a clause Cx randomly. 

• Let R be the set of all variables that support at least four clauses. 

• While there is a variable x ^ R that supports fewer than two clauses ^ Cx that consist of variables 
of R only, remove x from R. 

The clauses Cx will play a special role later. 

Lemma C.12. The expected number of solutions a G S{^) for which the above process yields a set R of 
size \R\ < (1 — k^ /2^)n is bounded by exp(— J7(n)). 

Proof. Let a G {0,l}^bean assignment, say a = 1. Let Q be the set of all variables that support fewer 
than three clauses. By Lemma IC. Ill we may condition on |Q| < k^2^^n. Assume that its size is \R\ < 
(1 - k^/2^)n. Then there exists a set 5 C \ (-R U Q) of size \k^n/2^ < S < k^n/2^ such that each 
variable in S supports two clauses that contain another variable from SUQ. With s = \S\/n the probability 
of this event is bounded by 



m 
2sn 



< [Aek' 



Hence, the expected number of set 5 for which the aforementioned event occurs is bounded by 

Since E [Z{^)] < exp(0fc(2^'^n)), the assertion follows. □ 
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Corollary C.13. Let d be chosen from D. Then the expected number of solutions a G S{^d)for which the 
above process yields a set R of size \R\ < (1 — Ic' /2^)n is bounded by exp( — J7(n)). 

Proof. Since the random formula # can be generated by first choosing d from D and then generating 
the assertion follows from Lemma lC.121 □ 



Let us call a variable x is attached if x supports a clause whose other k — 1 variables belong to R. 

Corollary C.14. W.h.p. a degree sequence d chosen from D has the following property. Let a G {0, 1}^ 
and let A be the event that a G and that a satisfies Conditions L and 2. in Definition\l\ Moreover, 

let Y be the number variables that support a clause but that are not attached. Then 



A 



l-o(l). 



Proof We may assume that d satisfies (IC.4I ). Let F = V \ R. Then (IC.4I) ensures that ^^^^^ < 
Therefore, for each of the "special" clause Cx that we reserved for each x that supports at least one clause 
the probability of containing a variable from F \ {x} is bounded by 

, Vol(F) 3k'^ 

Furthermore, these events are negatively correlated (due to the bound on Vol(F)). Since |F \ i?| < k^n/2^ 
w.h.p. by Corollary IC.13I the assertion thus follows from Chernoff bounds. □ 

Let us call a variable x £ V ^-rigid in a solution a G S{^d) if for any solution r G S{^d) with 
t{x) 7^ a{x) we have dist((T, r) > ^n. 

Corollary CIS. W.h.p. a degree sequence d chosen from D has the following property. Let a G {0, 1}^ 
and let A be the event that a G ^(^d) and that a satisfies Conditions L and 2. in Definition |7] Moreover, 
let Y be the number of variables that support a clause but that are k^^ -rigid. Then 



Y < nk^^A-'' I A 



I- oil). 



Proof. We condition on the event A. By Corollary IC.13I we may assume that the self-contained set R has 
size \R\ > (1 — k^ /2^)n. Assume that there is r G dist(cj, r) < n/k^, such that 



A = {xeR: t{x) / a{x)] 



is non-empty. Then A is dense. Indeed, every x G Z\ supports at least two clauses, and thus A must contain 
another variable from each of them. Thus, Lemma lC.lOl shows that \A\ >n/k^, which is a contradiction. 

Hence, w.h.p. all variables a; G are A:~^-rigid. Furthermore, if a variable y is attached, then for any 
solution r with r(y) ^ a{y) there is a; G ii such that t{x) ^ a{x). Consequently, all attached variables are 
/j-S-rigid w.h.p. Therefore, the assertion follows from Corollai'v lC.14[ □ 

To complete the proof, we need the following fairly simple lemma. 

Lemma C.16. The expected number of pairs of solutions a,T ^ such that ^ < dist{a,T) < — 

2-^'/2)n is < exp(-j7(n)). 
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Proof. For a given < a < 1 let Pq, denote the number of pairs o", r G S{^) with dist((7, r) = an As 
worked out in f3|, we have 



n 



■lnE[Pj < ln2 - alna - (1 - a) ln(l - a) + rln (l - 2^~^ + 2^^^{a^ + (1 - a 



It is a mere exercise in calculus to verify that the r.h.s. is strictly negative for all A; ^ < a < ^ — 2 ^Z"^. □ 

Corollary C.17. W.h.p. a degree sequence d chosen from D has the following property. The expected num- 
ber of pairs of solutions a,T €^ ^(^d) ^i^ch that ^ < dist{a, r) < — 2~'^/^)n is < exp(— J7(n)). 

Combining Lemma ICTTl Proposition IC.8I Corollary |C. 151 and Corollary IC.17I we obtain 



Corollary C.18. W.h.p. a degree sequence d chosen from D has the following property. Let a G {0, 1} 
and let A be the event that a € S{^d) and that a satisfies Conditions 1. and 2. in Definition\l\ Then 

[3. in Definition\l}is satisfied | ^] = 1 — o(l). 

Finally, Proposition |C.2| is a direct consequence of Lemma ICTTl Proposition IC.8I and Corollary |C. 181 



V 



C.3 Proof of Proposition lOl 

Let us begin with establishing the probable properties of d that we will need. 

Lemma C.19. Let d = (di, . . . , dn) be from the distribution D = D(A;, r, n). Then, with high probability, 
for any < a < (fcr)^/^, the sequence d has the following properties. First, for all i such that \i — kr\ < 
aVkr 

Di = \{j : dj =i}\ = {l + o(l)) Pr[Po(A;r) = i] n. (C.5) 
Moreover, the remaining variables satisfy 



D 



>a 



: \dj - kr\ > a\/A^} < 2e~"'/2^ and dj < le''^^ {kr)' 



n. 



(C.6) 



Proof. Let Pi, . . . ,P„ be independent Po(A;r) random variables, and note that the joint distribution of 
(di, . . . , dn) and (Pi, . . . , P„), conditional on J2i<i<n ~ km, coincide. Since the expectation of the 
sum of the Pj's equals km, Lemma [ATT] applied with 5 = implies that for any event E we have that 



Pr[d e <?] = Pr 



(Pi,...,P„) I Pi = krn 



l<i<n 



0(ni/2) Pr[(Pi,...,POG^]. 



In other words, it sufficient to show that the statements in the lemma hold with probability 1 — o(n~^/^) for 
a sequence of independent Poisson random variables. The statements the follow from the Chemoff bounds 
and the fact that for any X = kr and a as assumed 

Pr[Po(A) > aVX] < 26""" and ^ iPr[Po(A) = j] < le'^^/^X^. 

j: |i-A|>aVA 



□ 
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The aim of this section is to show that for any d satisfying the conclusions of Lemma |C.19l 



- lnPr[cr has Properties 1. and 2. from Definitional a G cS(<?d)] = + Oki^'^), (C.7) 
n 

i.e., Proposition IC . 8 1 holds . We will assume that a = 1 throughout. 

First of all, let C denote the number of critical clauses. Given that 1 is a NAE-satisfying assignement, 
then there are for each clause in total 2^ — 2 ways to choose the signs of the variables, each one of them 
being equally likely. Since the number of ways to choose the signs so as to obtain a critical clause is 2k, 
the probability that a given clause is critical is k/{2^^^ — 1). Moreover, the events that different clauses are 
critical are independent, implying that C is distributed like Bin(m, k/{2^^^ — 1)). 

Note that E[C \ 1 G S{0d)] = m ■ k/{2^-^ - 1) = An. By applying Lemma IAHI with 5 = we thus 
obtain that 

Pr[C = An I 1 G 5(^d)] = Oin-^/"^). 
It follows that the probability in (IC.7I ) equals 

©(n"^/^) • Pr[l is /3-heavy and no variable supports > 3A; clauses \ C = \n and 1 G 5(^d)]- (C.8) 

In the sequel we adopt a different formulation of this probabilistic question that is based on the classical 
occupancy problem. Let us think of the variables as bins, such that the ith bin has capacity di, where 
d = (di, . . . , In other words, we assume that the ith bin contains di distinguished "slots". Then we 
throw randomly An balls into the bins, i.e., the jth ball chooses uniformly at random one of the remaining 
Si<i<n ~ ~ 1) = ^^"^ ~ i + 1 available slots, for each 1 < j < An. In this setting, the probability 
in (IC.8I ) is equal to the probability that in the balls-into-bins game with the given capacity constraints the 
number of empty bins equals (1 — /3)e^^n, and no bin contains more than 2k balls. More precisely, let Ri, 
where 1 < i < n, denote the number of balls selected from the ith bin. Then, the probability in (IC.8I ) equals 

Pr[^ and B\, where A = : Ri = 0}| = (1 - p)e-^n" and B = "VI <i<n: Ri < 3k". 

We will show that the probability above is exp{(/(/3) + Ofc(4~'^))n}, which together with (IC.8I ) completes 



the proof of (ICTI) . 

In order to compute the probability of the event "^ and B" we resort to the following experiment. Instead 
of throwing An balls into the available slots, we decide for each slot independently with probability \/kr 
whether if receives a ball or not. Let T be the total number of balls that are thrown in this setting, and let 
Bi ~ Bin((ij, X/kr) be the number of balls that the zth bin received. Since the total number of slots is 
km, we have that E[r] = An. Moreover, conditional on any value of T, the T slots that receive a ball are 
a random subset of size T of all available slots. Thus, conditional on "T = An" the joint distributions of 
{Ri, . . . , Rn) and {Bi, . . . , Bn) coincide, and by abbreviating Xi = \{j Bj = i}\ we obtain that 



Pr[^ and B] = Pr Xq = {1 - /3)e"^n and X>3fe = | T = An 



(C.9) 



Before we estimate the latter probability, let us give some intuitive explanation why this should be equal 
to e^^^^^'^'^'"^^ i.e., why the conclusion of the proposition is true. Our assumption on the bin capac- 
ities (IC.5b guarantees that most bins have a capacity very close to kr « k2^^^ In 2. Recall also that the 
probability that any slot receives a ball is A/fcr w 2"'=+^This means that the expected number of balls that 
a typical bin receives is « k, which is far smaller than the capacity of that bin. But we can say even more: 
since the number of balls that are received by a typical bin is ?a Bin(A;r, X/kr), and the expected value is far 
less than kr, it is reasonable to assume that this number can be approximated well by a Po(A) distribution. 
So, the probability that a bin remains empty is close to e~^, and then the probability that the number of 
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empty bins is exactly (1 — j3)e~^n should be close to Pr[Bin(n, e~^) = (1 — /3)e~^n]. The ai^gument then 
completes by applying Lemma IR!21 

Let us now put the above intuitive reasoning on a rigorous ground. First of all, note that in the right- 
hand side of (IC.9I) the condition "T = \n" is global, in the sense that it binds the values of all variables 
Bi, . . . , Bn- We can get rid of this global restriction by applying the law of total probability. We obtain that 



Pr[^ and B] 



Pr [T = An and Xq = {1 - p)e-^n and X>3fc = O] 



Pr[r = \n] 
T = \n and Xq = {1- (3)e'^n \ X>3fc 



Pr 



Pr[X 



>3fc 



0] 



(C.IO) 



Pr[T = An] 



The remainder of the proof is devoted to showing the following bounds. 

Pr[T = An]=0(n-^/2)^ 

Pr[X>3fe = 0] > e-«'=(4~'=)", 

Pr ^ 



(C.ll) 
(C.12) 



T = An and Xq = (1 - /3)e-^n | Xy^k = > Pr[Bin(n, e"^) = (1 - /3)e"^n] • e"*^*^'^^ lt:.13) 
The three inequalities together with (IC.IOI ) imply that 

Pr [A and B] > Pr[Bin(n, e"^) = (1 - /3)e-^n] • e-°^(*^^"'\ 



and the proof of the proposition is completed after applying Lemma |R!2l 

In the remainder of the proof we will write Vi for the set of bins with capacity i and for the set 
of bins with capacity smaller than kr — a\fhr or larger than kr + aVkr, and note that iVil = Di and 



Proof of (IC.llI ). Since T is distributed like Bin(A;rn, A/A;r) we have that E[r] = An. The result then follows 
by applying Lemma IaTT] with 6 = OtoT. 



Proof of (IC.12I ). Recall that the number of bins with capacity i is denoted by Di. Since the number of balls 
in a bin with capacity i is distributed like Bin(i, X/kr), and these variables are all independent, we obtain 
that 



Pr[X>3fc = 0] = JJ Pr [Bin (i, X/kr) < 3k]^' 



i>0 



> Jl Pr [Bin(i, A//cr) = 0]^' • JJ Pr [Bin (i, A/A;r) < 3A;] 

i: \i—kr\<kVhr i: \i—kr\>kVhr 



Di 



(C.14) 



Our assumption (IC.6I ) guai^antees that d is such that 

^ iDi= dj < 2e-^"l\krfn. 

i: \i-kr\>kVkr jeV>l' 

Thus, if k is sufficiently large, the last term in (IC.14I ) can be bounded with 



Yl Pr[Bin(i,A/A:r) = 0]^' = JJ (l 

i: \i-kr\>ky/k^ j&V>'' 



kr 



kr J 



> e" 



(C.15) 
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Let us now consider the terms involving all i such that \i — kr\ < k^/kr in (IC.15I ). By using the estimate 
(h) ^ (ea/b)'^ we infer that for any such i and sufficiently large k we have 



Pr 



Bin ( i, 1 > 3A; 

kr 



<l '] flY' < ffiAV' < ( ekr{l + o,{l))kln2{l+Ok{l)) y' ^ 
^ 3k J \ kr J ~ \3k kr J ~ \ 3k kr ^ 



(C.16) 

Thus, since ^j>o Di = n, by using the fact 1 — x = e~^~®(^ ), valid for all |x| < 1, 



Yl Pr [Bin {i, X/kr) < 3kf^ > JJ (1 - 4"*^)^' = e"^ ""^^^^ 

i: \i—kr\<k\/kr i: \i—kr\<k\/kr 

This result, together with (IC.15I ) and (IC.14I ) finally prove (IC.12I ). 
Proof of den). Note that 

-An ^ /I _ ON -A„i _ \ . AN(l-/3)e-^n/i _ -AN(l-(l-/3)e-^)n 



Pr[Bin(n, e"^) = (1 - /3)e-^n] = (e"^)^^"''^" "(1 - 6"^)^^-^^-^^^ (C.17) 

— pje nj 

In the following proof we will approximate the probability of the event "T = \n and Xq = (1 — /3)e~^n", 
conditional on X>3fc = 0, by the right-hand side of the above equation times an eiTor term, which is of order 
exp{— Ofc(A;4~^)n}. In particular, we will identify the most relevant objects that contribute precisely these 
terms to the desired probability. 

In order to prove a lower bound for the probability of the event "T = An and = (1 — f3)e~^n" 
we will consider only specific configurations of balls that lead to the desired outcome. More precisely, let 
b = {bi, . . . ,bn) denote a possible outcome of the random experiment that we study, where bi denotes the 
number of balls in the ith bin. We will call b balanced if it has the following properties: 

1. Let j G Vi, where \i — kr\ > k\fkr. Then bj = 0. Informally, the D-^ bins with "too small" or "too 



big" capacities are empty. 
2. Let V'- denote the set of bins in V.-, that do not receive a ball. For all i such that \i — kr\ < kVkr 



D', = \V'\ 



A((l-/3)e-^-ZJ^Vn) 
1 - D^'^/n 



Informally, the fraction of empty bins among those in Vi is the same (and approximately equal to (1 
/3)e~'^) for all relevant i. 

3. Let Ti denote the total number of balls in all bins in Vi. Then, for all i such that \i — kr\ < k^/kr 



Di - D' Xi 



1 - (1 -/3)e-^ kr 



X, 



where x is chosen such that the sum of all ti is An. As we shall see later, see (IC.27I ). x is very close to 1. 
Then again, informally this requires that the fraction of balls in the bins in Vi is approximately A for all 
relevant i. 

4. For all 1 < z < n we have bi < 3k, i.e., X>3fc(b) = 0. 

By our construction, note that if b is balanced, then Xo(b) = (1 — j3)e~^n and T(b) = An. Thus, 

Pr[r = An and Xq = {I - (3)e'^n \ X>3k = 0] > Pr[(5i, . . . , B„) is balanced]. (C.18) 
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In the sequel we will estimate the latter probability. First of all, note that the number of ways to choose the 
empty bins in a balanced b is 

'A 



n [^)- (C.19) 

i: \i—kr\<k\/kr- 

Note that bins contained in V-'' do not have to be counted explicitly, since they are contained in the set 
of empty bins per definition. Let us write Bmij{N,p) for a binomially distributed random variable that 
is conditioned on being in the interval Then, after having fixed the locations of the empty bins, the 
probability that {Bi, . . . , Bn) is balanced with precisely the chosen set of empty bins is 



n 

kr\>k\/kr 



Bino,3fc(i, 7^ 



n 

i:\i—kr\<k\/kr 



Bino,3fc(i, 7^ 
^ kr 



D' 



Pr [T^ I X, 



>3k 



0], 
(C.20) 

where 71 is the event "Tj = ti and Vj G P \ "D^ : Bj > 1". Let T/ be a sum of Di — D[ independent 
variables, which are distributed like Bini 3fc(i, X/rk). Then 



Pr[7; I X 



>3fc 



0] = Pr 



T! 



Di - D' 



■ X 



Pr[Bino,3fc(i,A/rA:) > 1]^'"^^ (C.21) 



1 - (1 - /3)e~^ kr 

The probability that {Bi, . . . , Bn) is balanced is then the product of the terms in (IC.19I ) and (IC.20I ). In the 
remaining proof we will estimate the five terms in (IC.19I )- (IC.21I ). 

We begin with estimating the product in ( IC.19I ). Let a be such that D'- = aDi, and note that a is 
independent of i. Since < D-'' < 2e~'^ /^n, see (IC.6I ). we obtain that 

{1- (3)e-^ - D^''/n 



a 



1 - D>''/n 



(l-/3)e"^ + 0(l)e-'='/2. 



(C.22) 



By applying Proposition |A. 21 with a = (1 — /3)e ^ and e = 0(1) e '^^/^ we infer that 



n 

i:\i—kr\<k\/kr 



D' 



n 



0(1) 



y/a{l - a)Di 



,(/f(a)+Ofc(fce-'= _ ff(a)(n-D>'=)+Ofe(fce-'= /2)n 



i:\i—kr\<k\/kr 

By using once more the fact < D-^ < 2e~'^^/^n and by applying Proposition IA.2I we infer that 



n 

i:\i—kr\<k\/kr 



D' 



n 



(1 - /3)e-^n 



(C.23) 



This estimate contributes the binomial coefficient in (IC.17I ) to our lower bound for the probability in (IC.18I ). 
It remains to bound the expression in (IC.20I ). Let us begin with considering the first product, which accounts 
for all i that deviate significantly from kr. Since Pr[Binj j(A^, p) = £] > Pr[Bin(A^,p) = £] for all N,p, i,j 
and i < £ < j we have 



n Pr 

i: |«— fcr|>fc\/fcr 



Bino,3fc i, 



kr 







(C.24) 



Let us consider the middle term in (IC.20I ). Using again the property Pr [Biiij j {N, p) = £] > Pr [Bin(A^, p) 
£] and the facts I — x = e~^~®(^^) and A = In 2 + Ok{k2~^) and r = 2^^"^ In 2 — c we obtain 



n Pr 

i:\i—kr\<ky/kr 



Bino,3fc I i, 



A 

kr 



D' 



> exp 



i: \i—kr\<k\/kr ) 
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n. 



By using again the property of d in (IC.6I) we infer that 

iD[ = krn- ^ dj = km - Ok{e~^^ 

i:\i-kr\<kVkr je'D^'' 

Recall that a = (1 - /3)e"^ + 0(1) e"^'^/^. Thus the middle term in (IC.20b is at least 

n Pr Bino,3fe (i, = (e-A)(l-/3)e-^n . g-0,(fc4-^)_ 



This estimate contributes the (e^^)^^"'^)'^ ^'^ term in ( IC.17I ) to our lower bound for the probability in (IC.18I ). 
We finally consider the probability of the event Ti in (IC.20I ). c.f. also (IC.21I ). The last term in (IC.21I ) can be 
bounded as follows. First, note that 



Yl Pr[Bino,3fc(i,A/r/c) > l]^'"-^' > JJ Pr[l < Bin(i,A/rA;) < 3k] 

i:\i—kr\<k^'hr i:\i—kr\<k\/kr 



By using (IC.16I ) and the fact 1 — x = e ^ where < x < 1, we obtain 

Pr[l < Bin(i, X/kr) < 3k] > 1 - (1 - X/kr)' - A'^ = exp{-(l - X/kr)' + 0^(4"'')}. 
With this estimate at hand we can bound the last term in (IC.21I ). We get that 



n 

■.\i—kr\<k\/kr 



Binn ?.k\i.-^\ > 1 



kr 



>exp<|-(l-a) (l-AArrA + 0;^-(4^^) 

i:\i—kr\<k\/hr 



n 



Our assumption ( IC.5I ) on d guarantees that Di = (1 + o(l)) Pr [Po(fcr) = i]n. Thus, the sum in the previous 
equation is at most 

(1 + o(l))n ^(1 - X/kry Pr[Po(A:r) = i] = {l + o(l))e-^n, 



j>0 



from which we get that, by applying again the fact 1 — x = e 



n 

i:\i—kr\<k\/kr 



Bino,3fc(^, ) > 1 



D^-D' 



> (1 _ e-A)(i-{i-/3)e-^)n . ^-o,iki-')_ (C.26) 



This estimate contributes the last missing term in (IC.17I ) to our lower bound for the probability in (IC.18I ). 

It remains to bound the probability for the event "T/ = t" in (IC.21I) . for all i with the property \i — kr\ < 
ky/kr. Recall that ti = ^ • x, where x is such that the sum of the t fs is An. Let us begin with 

estimating the value of x. Note that 



An = X 



xA(l — a) 



kr{l - (1 - /3)e-^) 

i: \i—kr\<k\/k7- i: |i— fcr| <fc\/fcr 



Recall (IC.22I) . which guarantees that a = (1 — /3)e ^ + 0(l)e '^^Z^. Moreover, the property (IC.6I ) allows 
us to assume for large k that Y2j(^v>'' '^j — ^"^"^^ Thus, the above equation simplifies to 



An 



xA(l- (l-/3)e-^ + Ofc(e^^-'/ ^)) , 
kr(\ - (1 -/3)e-^) 



(\-Ok{e-^^l^))krn =^ x = 1 + 0^(6"'^'/^). (C.27) 
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Let us now return to our original goal of estimating the probability for the event "T/ = t" in (IC.21I) . Recall 
that is the sum of Di — D'- independent variables, all distributed like Bini 3fe(i, X/kr). We will apply 
Lemma IaTT] First of all, note that 

= Pr[l<Bin(.,AAr)<3fc] = Vr ^ ^'^^^ 
and similarly, since i = &{l)kr, that 

= Var[Bini,3fc(i,A/fcr)] = 0(1) ^ = 0(A). 

Thus, the event "i;' = W is equivalent to "7;' = (A - D9(E[Bini,3fc(i, A/Zcr)] + 0kik^/'^2~'')a)". By 
applying Lemma IaTT] we arrive at 



n PrK' = k] = exp < 

i: \i—kr\<kVkr 



J2 + 0{c6^)){D, -D'j\= exp{-Ofc(M-'=)n}. 

,i: |fi— fcr|<fc\/fcr 



Combining this result with Equations (IC.18b - (IC.21b and (IC.23I )- (IC.26I ) yields (IC.13b . as desired. 

D Proof of Lemma IC.5I 
D.l Outline 

Let (7 = 1 be the all-true assignment and let d be a degree sequence chosen from the distribution D. Let U 
be the event that cj is a /3-good solution. Furthermore, let U' be the event that ct is a solution that satisfies 
conditions 1. and 2. in Definition [T] 

Fact D.l. Let dbe a degree sequence chosen from the distribution D. Then P [U] ~ P [E'] w.h.p. 

Proof. This is a direct consequence of Corollary IC.18I □ 

Let Z'p{t) be the number of solutions r such that dist(cj, t) = t that satisfy conditions 1. and 2. in Defi- 
nition [T] Moreover' let Z'^ be the number of all solutions r that satisfy conditions 1. and 2. in Definition [T] 
For <t <n/2 we let 

flit) = E [4(t) I U'] . 
The main step of the proof lies in establishing the following proposition. 

Proposition D.2. There is a constant c = c{k) > such that for d chosen from D the following two 
statements hold w.h.p. 

1. Wehave fi{n/2) < ^-E Z'^ 

2. for anj a € — 2"^'/"^, ^] we /zave /i (an) < exp — c(a — ^)^n /x (n/2) 
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Proof of Lemma [C5| ( assumins Proposition ID. 21 ). By Fact ID. II we have w.h.p. 



(i-2-''/3)n<t<n/2 (i-2-'=/3)„<(<„/2 

< E E[4(t)|i;'] 

(i_2-fc/3)„<(<„/2 
(l_2-fc/3)„<(<„/2 

< dy/n ■ /x(n/2) [by Proposition ID.2I part 2, with c' = c'{k) > 1] 

< cc • E [Z'l^] [by Proposition IE2 part 1] 

< (1 + o(l))cc'E [Z^] [by FactlDJl, 

as desired. □ 
The following subsections are devoted to the proof of Proposition ID. 21 



D.2 The probabilistic framework 

Recall that we denote the clauses of a /c-CNF formula # by (Pi, . . . , 0rn, i-e-> <P = A - • ■A<Pm. Furthermore, 
for each clause #j we let (Pn, . . . , (Pi^ signify the literals that the clause consists of, i.e., (Pi = (Pn V • • ■V'Pik- 
We are going to break down fj,{t) into a sum of different terms of various types. This requires a few 
definitions and a bit of notation. Given the sequence d = {dx)xev chosen from the distribution D, we let 

B=\J{x}x [dx] , 

where [dy] = {1, 2, ... , d^}- We think of the elements of B as "balls", so that B contains dx balls (x, j), 
j G [dx], associated with each variable x. A configuration is a bijection tt : B ^ [m] x [k]. Furthermore, a 
signature is a map s : [m] x [k] — )■ {±1}. 

A configuration vr and a signature s give rise to a formula <P{7r, s) as follows: for each (z, j) G [m] x [k] 

- <P{s, 7r)ij is a positive literal if s{i,j) = 1 and a negative literal if s{i,j) = —1, 

- the variable underlying ^{s, ir)ij is the variable x such that (i, j) G 7r(x, [d^,]). 

We let TT denote a configuration chosen uniformly at random, and we let s denote a signature chosen uni- 
formly at random and independently of vr. 

Fact D.3. For any event £ we have P G <S] = P s) G £]. 

Proof. For each formula <P with degree sequence d there are precisely OxeV^^,' pairs (s,7r) such that 
(p = <p{s,Tr). □ 

Thus, from now on we may work with the random formula ^(tt, s) that emerges from choosing a random 
configuration and independently a signature. This will be useful because some properties depend only on 
the signature, and thus we will be able to treat them independently of the choice of the configuration. 

Let g : B ^ {red, blue} be a map that assigns a color to each ball. For each variable x we let 

redxig) = |{j G [dj : g{x,j) = red}| , blue^.(5f) = |{j G [dx] : g{x,j) = blue}| . 

Furthermore, for a pair {gaigr) of maps B — > {red, blue} and r G {0, 1}^ we say that (u, r) is ((^cr, g^-)- 
valid for a formula # if the following conditions are satisfied. 
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• Under a each variable x supports precisely redx{ga) clauses. 

• Under r each variable x supports precisely redxigr) clauses. 

• The number of clauses that any x supports under both o", r is | {j G [dx] : ga{x,j) = gr{x,j) = red} | . 

Let s be a signature and let vr be a configuration. We call an assignment r G {0, 1}^ g-validfor (s, vr) if 
the following two conditions aie satisfied. 

• T G cS(<?(s,7r)). 

• Forany G [m] x [A;] the following is true. Let (n, = iT{i, j). Then g{i, j) = red iff |<?(s, 7r)u„| 
supports \^{s, 7r)u|. 

In words, r is (/-valid for {s, it) if r is a solution of the formula (p{s, vr) induced by s, vr, and if each ball 
that is colored red under g supports the clause that it is mapped to under vr, and vice versa. 

FactD.4. Let ga,gT ■ B {blue, red}. Then 

P [{a, t) is {g„, gT-)-valid for ^P(s, tt)] = P [ct is gfj-valid and t is gr-valid for (s, tt)] . 

Proof. Let be a formula such that {a, r) is {go-, (7r)-valid for (p. Then the total number of pairs (s, vr) with 
(p = (p(^s, vr) such that a is g^-valid and r is (7T--valid for (s, vr) equals 

n n \{{x}K[dx])ng^\c)ng~\c'))\l, 

x&V c,c'G{red,blue} 

a term that is independent of (p. □ 

A profile C consists of two maps go-, gr '■ B ^ {blue, red} and a set T C g'^^ihlue) n (7^^ (red) such 
that [(^"^(red)! = |(j(~^(red)| = An and such that j:edx{ga), ^e<^x{gT) < 3A; for all x ^V. 

Let C be a profile. Moreover, let r G {0, 1}^, let s be a signature, and let vr be a configuration. We say 
that ((7, T, s, vr) is C-valid if the following conditions are satisfied. 

1. £7, T are ^T-valid for (s, vr). 

2. Let {x, I) G g;^^ (blue) n g^:^ (red). Let {i, j) = vr(x, /). Then (x, /) G T iff <^(s, vr)^ is fi-critical. 

In words, this means that {a, r, s, vr) is C-valid if a, r are solutions of the formula (p{s, vr) under which 
the colors assigned to the literals by ga,gT "work out" (i.e., a ball is red iff vr puts it in a place such that 
it supports the clause it occurs in), and if a ball (x, j) belongs to F if it supports a clause under r that is 
supported by another ball under a. 

Let V be the set of all profiles. For any C £V and any t let 

ficit) = E |r G {0, 1}^ : dist(cj, r) = t and (a, r, s, tt) is C-valid| 
where the expectation is taken over s, tt. 
Fact D.5. We have 

^(t) = ^cev^dt) _ ^^^^ 



2-"E 



Proof. The denominator equals the probability that a is a NAE-solution that satisfies the first two conditions 
in Definition [U Furthermore, fici't) accounts for the probability that the pair {a, r) is C-valid, because for 
any s, vr and any r there is no more than one profile C G "P such that (cr, r, s, vr) is C-valid. Hence, (ID. II ) 
follows from Facts |D3] and El □ 
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We call a profile C = {ga^gr-, r) good if 
1 



n 



I ^ (red) n (red) I G 



k 3k 



3 . 2fc ' 2^=^ 



and - irl G 

n 



fc2 3k^ 



3 . 2fc ' 2^= 



Let "Pg be the set of all good profiles, and let Vb = V \ Vg. Furthermore, let 

cePb 



In Appendix ID . 3 1 we are going to show the following. 

Proposition D.6. W.h.p. the degree sequence d chosen from D is such that 

f^bit) = 0(1). 

(i_2-fc/3)„<t<| 

Furthermore, in Appendix ID.4I we are going to prove 

Proposition D.7. W.h.p. the degree sequence d chosen from D has the following property. Let C £ Vg and 

let \ - 2-^/3 < a < i Then 



He {an) < exp 



-c \ a n 

2 



ficin/2) + exp(-J7(n)). 



for a certain c = c{k) > 0. 

We will also need the following fact. 

Proposition D.8. W.h.p. the degree sequence d chosen from D is such that fi{n/2) < -^^\Z'^ for a 
certain c = c(A;) > 0. 

Proof. Note that by dP.lb the claim is equivalent to showing 

^/ic(n/2)<m-V22-nE[4]2. 

However, since E[Z^] is the sum of the expectations of indicator random variables over all possible assign- 
ments, by expanding E[Z^]^ we arrive at an expression that is a sum over all profiles C G "P. Then the 
results follows essentially by performing a term-by-term comparison with the left-hand side of the above 
inequality. □ 

Proposition ID . 21 is an immediate consequence of (ID. II ) and Propositions ID.6llb.7l and ID. 81 



D.3 Proof of Proposition ID.6I 

Let ^ be a /c-CNF and let cr, r G {0, 1}^. We say that (i,j) G [m] x [/c] is a-red if supports 
under a. Let red((T, <?) be the set of all a-red pairs (i, j). We define the term a-hlue and the set blue(cj, ^P) 
analogously. Furthermore, let r{a,T,<P) be the set of all such that G blue(cr, ^) n red{a,^) 
while is critical under a. 

Finally, we call the pair (o-,t) G 5(^)2 W if (i-2-''/3^n < dist(cr,r) <n/2and one of the following 
conditions holds: 



32 



|red((T,(P) n red(T,^)| [ 

3-2'^ ' 2'= 



|r(f7,r,^)| 



fcn 3-fcn 1 
3-2'= ' 2* J ' 



Lemma D.9. Let B be the number of bad pairs (cr, r) G T/ien E [B] = exp(— i7(n)). 

Proo/ Let fj = 1 and let a e [\ - 2"'=/^, i] . Let S{a) be the event that cj, r G As shown in IS, we 

have 

P [5] = (1 - 2^-^ + 2^-^{q^ + (1 - q)'=))™. 
Let = |red(c'", ^) n red(T, ^)|. Given that 5 occurs, R has a binomial distribution 

k{a^ + {l-af) 



Bin I m, 



(2^-1 - l)(l - a(l - af-^ - (1 - a)a'=-i) 



For given that o" is a solution, there are a total of 2'^ — 2 ways to choose the signs of the k literals in any 
clause, and precisely 2k ways to choose the signs so that the clause is critical under a. Given that it is, 
there are n^{l — a(l 



a 



,fe-i 



(1 — a)a ) ways to choose the actual variables that occur in the clause 
so as to ensure that r is a solution, too. (Namely, we have to avoid that either r and a differ on the a- 
supporting variable only, or that they agree on the u-supporting variable only; furthermore, the probability 
that a, T differ on a randomly chosen variable is equal to a.) Finally, given that a given clause is cr-critical, 
the probability that the clause is critical under r and supported by the same variable as under a is equal to 
a'^ + (1 — a)^ (for a, r would either have to agree or disagree on all the k variables). 
Further, let G = \r{a,T,^)\. Given that S occurs, G is a binomial variable 



Bin I m, 



fc(/fc-l)(a2(i_a)'=-2 + a*^-2(l 



a) 



(2fc-i - l)(l - a(l - a)*^-! - (1 - a)a'=^i) 



For in each cr-critical clause there are k — 1 ways to choose another literal j to support that clause under r, 
and to materialize this choice, r has to either disagree with a on the cr-supporting literal and on literal j and 
agree on all other literals, or the inverse configuration must occur. 
It is easily verified that for any a G [| — 2"'"'/^, i] we have 



E[i?|5] 
E[G\S] 



km 

22fc-2 

k'^rn 



'22fc-2 

As R, G\S are binomially distributed, Chernoff bounds yield 

kn 3kn 



G 



kn kn 
2k ' 2^^i 

k'^n k'^n 



2k ' 2^~^ 



Pr 



Pr 



R 



3 • 2*^-1 ' 2^=-! 



3 • 2*=-! ' 2'=-! 
Since the total expected number of pairs of solutions is 



< exp 

< exp 



-^k 



k_ 
2^ 

2k 



n 



n 



(D.2) 
(D.3) 



E [Z^] < exp \Ok{2-'')n 



the bounds ( ID.2I ) and ( ID.3I ) imply that E [B] < exp(— i7(n)), as claimed. 



□ 



Proposition ID.6l is an immediate consequence of Lemma |D.9[ because the experiment of first choosing 
d from the distribution D and then generating #d yields precisely the uniform distribution ^. 
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D.4 Proof of Proposition |DJ] 

Let C = {g„,gr,r) eVg. For c, c' G {red, blue} let 

gc,c' = gc,c'{C) = \gr'^{c) n g^-^ic')] /n, and let 
7 = 7(C) = \r\/n. 

Furthermore, for any a,T G {0,1}^ we define 

|{x£g-i(c)ng-l(cO :a(x)=r(x)}| 
ac,c' = ac,c'(o-,T,C) = J 

ar = ar(c,r,C) = |{(x,i) G F : t{x) = o-{x)}\ \r\, 

a = Oc{a, T, C) = (ared,red, Ored.blue , ablue,red, ^blue.blue , Or) G [0, 1]^ . 

An important observation is that by symmetry, the probability for a pair {a, r) to be C-valid is governed by 
their "overlap vector" a. More precisely, we have 

Fact D.IO. LetC = {ga,gT,r) G Vg. Leta,T,T' G {0, 1}^ be such that cx{a,T,C) = a{a,T' ,C). Then 

P [(cr, r, s, tt) is C-valid] = P [(cj, r', s, tt) is C-valid] . 
Fact ID ■ lOl motivates the following definition: for a = a((T, r, C) we let 

Pc{ol) = P [(f, r, s, tt) is C-valid] . 
For a real a G (0, 1) we call a vector ol = (ored.red, • • •) a-tame if 

|'^red,red ^ 

|^red,blue ^ 

|^blue,red ^ 
|ciblue,blue 

\ar — 01 



< 10/VA;, 

< 2^*^/^ 

< 2"''/^ and 

< 100//C. 



Let T{a) be the set of all a-tame vectors. The following lemma shows that we can neglect "overlap vectors" 
ct that are not tame. 

Lemma D,ll. Let C = {ga,gT-,r) G Vg. Let W be the number of pairs (cr, r) G 5($)^ with I — a = 
dist{a, T)/n € — 2"'^/'^, ^] and such that there is a profile C such that a{a, r, C) T(a). Then E [VF] = 
exp(— i7(n)). 

The proof of Lemma ID. Ill is based on a similar- first moment argument as in the proof of Lemma ID. 91 
Furthermore, in Section 1531 we will establish the following. 

Lemma D,12, LetC = {g„,gr,r) G Vg.Leta G Tia) for some a £ - 2~^/'^, \]. Letting 5 = cx-\l, 
we have 

— In ( \ \ <Ok (k) ■ [5'red,red('^red,red'^blue,blue + f^blue.blue) + 7('5r5blue,blue + ^^blue.blue)] 



n 



2 

4 



( 2^ j ['^red.blue^blue.blue + '^blue.red'^blue.blue + 



2 1 
blue, blue I 



34 



For a number a £ ~ 2 '^Z^, |] let pcia) be the probability that for a random T G {0, 1}^ with 
dist(cr, r) = an we have a{a, r, C) G T(a) and (a, r, s, tt) is C-valid. We will derive the following conse- 
quence of Lemma lD.12l in Section 1531 

Corollary D.13. Suppose that a G — 2"*^/^, ^] and let C be a good profile. Then 

2 



pc{a) < pc(l/2) • exp 



Ofc(fcV2')-(a-^) -n 



+ exp(— f2(n)). 



Proof of Proposition ID. 71 By Proposition ID .61 and Lemma ID. Ill for a random d chosen from D we have 
w.h.p. 

/^c(«) < ( Vc(a) + o(l)- 
\an J 

Thus, it suffices to estimate ((^)pc(Q!). By Stirling's formula and Corollary ID. 131 

vMl/2)J-n l^(„>c(l/2)J 



n 



< - (4 - Ofc(a - 1/2) - Ok{k^/2^)) • (a - i 

1X2 



whence the assertion follows for k > Icq sufficiently large. □ 
D.5 Proof of Lemma iDJil 

A map / : [m] x [A;] — ^ {red, blue} is called a coloring if for each i G [m] there is at most one j G [k] such 
that f{i,j) = red. Let fa-, fr be colorings. We say that the pair / = {f^, fr) is compatible with a profile 

e = {ga,gr,r) if 

\g-\c) n 5-'(c')| = |/-Hc) n /-i(c')| for any c,c' G {red, blue} , 
|r| = |{i G H : 3j / / : = red A = red}| . 

Let / be a coloring and let t : [m] x [A;] — {0, 1} be a map. We call (/, i) valid for a signature s if the 
following two conditions are satisfied: 

• for any i G [m] there exist j, I G [A:] such that s(z, 7^ 

• if /(i, j) = red, then for all / G [k] \ {3} we have s(i, / s(i, 

Intuitively, this means that any formula in which the signs are given by s is NAE-satisfied if for all G 
[m] X [A;] the literal in position (i, j) takes the value t{i,j). Furthermore, for each (i, j) with f{i,j) = red 
the literal in position (z, j) supports clause i if the truth values are given by t. 

Let a G [0, 1]'^ be a vector. Let / = {fa, fr) be a pair of colorings. Let t : [m] x [k] — > {0, 1}. We call 
(/, t) compatible with a if 

\t-Hi)nf-\c)nf-\c')\ 1 H 

Or r' = 1 1 1 i for all c, c G red, blue k and 

|/.-'(c)n/.-^(c')| 



ar 



\t'^l) n G H X [A;] -.Bj^l: fa{i,j) = red A /^(i,/) = red}| 

\{i G H : 3j / / : = red A /^(i,/) = red}| 
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Let t : [m] x [k] — > {0, 1} be uniformly distributed, and let 

qf{a) = Fs,t [{f, t) is valid for s \ (/, t) is compatible with a] . 

Fact D.14. Suppose that f is compatible with a profile C. Then for any ol we have Pc{ol) = Qfi^t)- 

Proof. Let t : [m] x [k] be be such that (/, is compatible with a. Let r G {0, 1}^ be such that a = 
a{a, T, C). Let 77 be the set of all vr : — )• [m] x [k] such that t(7r(x, i)) = r(x) for all x G y, i G [dx]. 
Then 77 consists of all vr that map the right "type" of "ball" to each position Therefore, 

l-^l = {{o:rgrn)\{{l - ar)grn)\ ■ ((ared,red5red,red?^)!((l - ared,red)5'red,red?^)! 

•((ared,blue5'red,blue^^ " argrny.{{l - ared,biue)5'red,biue"- " (1 " ar)grn)\ 
•((ablue,red5blue,red^^ " OLrgrn)\{{l - abiue,red)fl'biue,red"- " (1 " a.r)grn)\ 
•((ablue,bluefl'blue,blue"- - Oirgrn)\{{l CKblue,blue )5biue,biue'i- " (1 " oir)gr'n)\ . 

Hence, |77| is independent of the actual map t, which implies the assertion. □ 

Thus, we are left to compute qf{a) for a fixed pair / = {fa-, fr) of colorings that is compatible with the 
good profile C. To facilitate this computation, we simplify the random experiment further. Namely, let 

n = G H X [k] : f{i,j) / (blue, blue)} , B = [m] x [k]\n. 

For maps t^-ed ■ TZ ^ {0,1} and tbiue : ;S — {0, 1} we let tj-ed U tbiue : ["i] x [k] be the map defined by 



Furthermore, we say that (/, tred) is compatible with a if there exists tbiue such that (/, tred U tbiue) is 
compatible with a. 

Suppose that (/, tred) is compatible with cc. Let fbiue : B — > {0, 1} be obtained by setting tbiue(*5 j) = 
1 with probability Obiue.biue and fbiue(^)i) = with probability 1 — Obiue.biue independently for all 
G B. Furthermore, let 

qf{a, tred) = P [(/, ired U ibiue) is Valid for s\ (/, tred U tbiue) is Compatible with a] . 
Fact D.15. Suppose that (/, tred) compatible with a. Then qf{a) = qf{oL, tred)- 
Lemma D.16. Suppose that (/, tred) is compatible with ex. There is a number C = C{k) > such that 

qf{OL, tred) < C • P [(/, tred U tblue) " Valid for s] . 

Proof. We have 

qf{cx, tred) = P [(/, tred U tbiue) is Valid for s\{f, tred U tbiue) is Compatible with cx] 

= P [(/,ired U tbiue) is Valid for S\ |tbi„e(l)| = Ctblue.blue l-^l] 



P [(/) ^red U tbiue) 


is valid for s A 


|ibiL(l)| 


— Ctblue.blue ^ ] 




*blue(l)| — "blue,blue S ] 



P [(/, ired U tbiue) is Valid for s] ■ 

P [|^blue(^)| — Clblue,blue \B\ I (/, ired U tbiue) is vahd for si 



P [|*blue(l)| = ablue,blue \B\\ 



(D.4) 
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We claim that 



P [|*blue(l) I = ablue.blne 1^1 |(/, ^red U tblue) is Valid for s] = 0{n~^/^). (D.5) 

For given that (/, ired U tbiue) is vahd for s, |tbiue(l) | — ctbiue,biue I'^l is the sum of 771 independent con- 
tributions, as the tbiue(^)i) are independent Bernoulli variables for all G B. Furthermore, given 
(/i^red U tbiue) is Valid for s for all i such that red fa{i x [k]) U /^(i x [k]) the random variable 
J2je[k] ^hiue{i,j) takes any value between 1 and k with non-zero probability. Therefore, the conditional 
random variable |tbiue(l)| has a local limit theorem, see Lemma IaTTI and (ID. 51 ) follows. 

As the unconditional distribution of |ibiue(l)| is just a binomial distribution with mean Obiue.biue 
we have 

P [|*bie(l)| = «blue,blue 1^1] = ^{12^^/^). 

Combining this with (ID. 41 ) and (ID. 51 ) yields the assertion. □ 
Combining Facts ID . 1 4l and IdTSI with Lemma D.161 we obtain 
Corollary D.17. Suppose that (/, tred) i^ compatible with a. Then 

Pc{oi) <C ■¥ [(/, tred U tbiue) « Valid for s] . 
The crucial feature of the term 

P [(/, ired U tbiue) is vahd for s] 

is that in the underlying random experiment, the clauses are independent objects, although there ai^e different 
"types" of clauses. This independence property allows us to derive the following estimate. 

Proposition D.18. Suppose that (/, tred) is compatible with a. Let V be the event that (/, tred U tbiue) is 
valid for s. Let a = abiue,biue- Then 

-lnP[V] =V'<x + V'r+ V V'cc', (D.6) 

c,c'£{red,blue} 

with the tps as shown in Figure\J\ 

Proof. The first summand i/'cr accounts for the probability that a = 1 is a NAE-solution and that preicsely 
the clauses i such that f{i,j) = red for some j G [k] are 1-critical. There are precisely Xn such clauses, 
and for each of them the probability of being critical with supporting literal equals 2^~'''. Furthermore, 
for the (r — A)n other clauses the probabiUty of being non-critical but NAE-satisfied equals I — {k + 1)2^^^'. 
Since these events depend on the signs of the hterals only, they occur independently for all clauses, which 
explains i/j^- 

The V'red.red term is derived quite easily as well. The number of positions (i,j) such that fa{hj) = 
frihj) = red equals ^red.red'^- There are precisely ared,redfi'red,red'^ among these such that Uedihj) = 1- 
Each such position (i, j) supports its clause under t iff t{i, I) = I for all I ^ [k]\ {j}. By the construction 
of t, the probability of this event is a^~^. Similai^ly, the "success probability" is (1 — a)^~^ for all {i, j) with 

*red(i,j) = 0. 

The next factor ipr accounts for the number of G /~^(red) n /^^(blue) such that clause i is 
(7-critical but supported by another literal / / i under a. Each such clause contains precisely k — 2 literals 
h £ [k]\ {j, 1} such that /t(«, h) = fa{i, h) = blue. If tred(*ii) = !> then t{i, h) = for all h, which 
occurs with probabihty (1 — a)^'"^. Similarly, if tredihj) = 0> then t{i, h) = I for all h, the probability of 
which equals a^~^. 
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'0red,red ■ 

ipr 

'0red,blue 

V^bluGjred 
,blue 

77(a) 



(1 - A:)Aln2 + (r - A) ln(l - (fc + 1)2^"'=), 

S'red,red(fc - 1) [Ored.red ln(a) + (1 - Clred.red) ln(l — o) 

7 (A: - 2) [a-y ln(l - a) + (1 - 0^.) In a] , 

ffred.blue — 7i 

gred, blue (Qred, blue — arj) 



-51n(2*^"^ -k- 1) +a.^ln (^1 - a*"'^ - (1 - a)*^"^ - (fc - - a)'''" 

+ (1 - ae)Cln (1 - a''-'' - (1 - a)'="' - (fc - l)a'="'(l - a)) , 



.9blue,red " 7; 

pblue ,red (Oblue.red — "rl) 



-(1 - Qc)Cln (1 - a'''"' - (1 - a)''-^ - [k - l)a'''""(l - a)) , 
1 + fc - 77(a) 



+ 



k-l 



(r - 2A + ffred.red) In . 2* - 1 

a*^ + (1 - a)*-' + ka(l - a)*^"^ + fca'="^(l - a) + 
fc(a(l - a)'="' + (1 - a)a''-^ + a'= + (1 - a)*^ + 

(fc - l)a'''"'(l - a)" + (fc - l)a'(l - a)*"'"') 



where 



Fig. 1. The expUcit expressions for Proposition |D.18l 



The term V'red.biue deals with clauses i such that (i, j) G /~^(red) n /^^(blue) \ F for some j. The 
total number of such clauses is ^n. For each of these indices i we have /cr(z, /) = blue for all I G [k] 
(because ^ F). Suppose that tj.ed{i,j) = 1- Since clause i is non-critical under cj = 1, it contains a 
total of /i > 2 literals whose signs agree with that of literal j. In order for clause i to be supported by literal 
j under t, the h — 1 other literals Z whose signs agree with that of literal j must take the value t{i, I) = 0, 
while the k — h remaining literals / must take value t{i,l) = 1. Summing over h and taking into account the 
distribution of the signs, we obtain the overall probability in the case tred(^> j) = 1: 



k-2 

E 



2^ - 2/fc - 2 



(1 - ay 



-k-l 



\k-l 



\k-2 



The case tj.ed{i,j) = is analogous to the above, and a similar argument yields iphiue,Ted- 

Finally, V'biue.biue accounts for all clauses i such that fa{hj) = frihj) = blue for all j G [k]. There 
are precisely (r — 2A + (^red.red)"- such clauses. Each of them is supposed to be assigned such that under both 
a = 1 and t at least two literals evaluate to "true" and at least two evaluate to "false". Given the distribution 
of the signature s and of t, the probability of this event equals r]{a). However, we are already conditioning 
on the event that each clause contains at least one Uteral of either sign (this probability is accounted for by 
^pa). Hence, the conditional probability of the desired outcome equals . Since the clauses are 

independent, the overall probability is given by V'biue,biue- Q 

Proof of Lemma ID. 1 21 The assertion simply follows from Proposition ID. 1 81 by Taylor expanding the right 
hand side of (ID. 61 ) around il. □ 
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D.6 Proof of Corollary lDl3] 

We begin with the following observation, which hinges upon the assumption that we work with a good 
profile. 

Proposition D. 19. There is an absolute constant c > such that for a random d chosen from D the 
following is true w.h.p. LetC be a good profile, let — 2~^/^) < a < ^, and let r be chosen uniformly at 
random from all assignments such that dist{a, r) = an. Then for any 6 > we have 

P [|abiue,biue - a\> S]< exp(-c(^^n), 
P [|ared,biue - a\ > 6] < exp(-c(^^n), 
P [|abiue,red - a\ > 6] < exp(-c(^^n), 

P [|ared,red - a\ > 6] < exp(- C;red,red'^^?^/^^) , 

P [\ar - a\> 6]< exp{--f6'^n/k'^). 

Proof. Recall that a = 1. By standard monotonicity arguments, we may assume that r is obtained by 
letting t{x) = with probability a and t{x) = 1 with probability 1 — a for all x S V independently. 
Furthermore, since by standard arguments the degrees dx are asymptotically independently Poisson, w.h.p. 
the degree sequence d is such that 



x&v \ xev 



n < 10(/cr)^n. 



(D.7) 



Hence, we are going to assume that ( ID.7I ) is satisfied. 

We begin by analyzing abiue,biue- Switching the value t(x) of a single variable x ^ V can only alter 
the random variable ctbiue.biue by ii^/(5'biue,biue?^)- Therefore, by Azuma's inequality and (ID.7I ). for any 

t > 



P [ I "blue, blue - E [Oblue.blue]! > ^/(S'blue.bluen)] < exp 



< exp 



10n{kr) 



(D.8) 



Since ^biue.biue < ^krn for any good profile, (ID. 81) yields the first inequality. 

With respect to ared,biue» recall that in a good profile each x ^ V satisfies redT-(x) < k (recall that 
red,- depends on the profile C only). Therefore, Azuma's inequality yields 

r +2 -\ 

P [l^red.blue " E [Ored.blue]! > V(5red,bluef^)] < exp 



t 

k^n 



(D.9) 



Since ^red.biue > ckii for a certain constant c > 0, the second claim follows from (ID. 91 ). A similar- argument 
yields the third inequality. 

Regarding ared,red» we recall that given C we know how many "red/red balls" each variable has. Since C 
is good, their total number is gred,red'n < k'^2~'^n. In particular, there are no more than g-red.Tsd'n variables 
that have a "red/red ball" in the first place. Furthermore, switching r(x) for a single variable x can alter 
"red.red by at most A;/((7red,red?^), bccausc red,- (x) , red(j (x) < k for all x as C is good. Therefore, by 
Azuma's inequality 

+2 



P [|ared,red " E [Ored.red]! > i/lS'red.red"-)] < exp 



(D.IO) 



^^5'red,red'T' 

(The ^red.red in the denominator mirrors the fact that no more than ^red.red?^ variables have a "red/red ball".) 
Setting t = 5g-red,rBd'n yields the fourth inequality. The last inequality follows from a similar argument. □ 

Finally, CoroUai v lD. 13l follows by comparing the bounds on the deviations of the individual components 
of OL from Proposition ID . 1 9 1 with Lemma lD.12l and Lemma lD.llI □ 
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